Abstract
Artificial neural networks (ANNs) have experienced a rapid advancement for their success in various application domains, including autonomous driving and drone vision. Researchers have been improving the performance efficiency and computational requirement of ANNs inspired by the mechanisms of the biological brain. Spiking neural networks (SNNs) provide a power-efficient and brain-inspired computing paradigm for machine learning applications. However, evaluating large-scale SNNs on classical von Neumann architectures (central processing units/graphics processing units) demands a high amount of power and time. Therefore, hardware designers have developed neuromorphic platforms to execute SNNs in and approach that combines fast processing and low power consumption. Recently, field-programmable gate arrays (FPGAs) have been considered promising candidates for implementing neuromorphic solutions due to their varied advantages, such as higher flexibility, shorter design, and excellent stability. This review aims to describe recent advances in SNNs and the neuromorphic hardware platforms (digital, analog, hybrid, and FPGA based) suitable for their implementation. We present that biological background of SNN learning, such as neuron models and information encoding techniques, followed by a categorization of SNN training. In addition, we describe state-of-the-art SNN simulators. Furthermore, we review and present FPGA-based hardware implementation of SNNs. Finally, we discuss some future directions for research in this field.
1 Introduction
In recent years, artificial neural networks (ANNs) have become the best-known approach in artificial intelligence (AI) and have achieved superb performance in various domains, such as computer vision (Abiodun et al., 2018), automotive control (Kuutti, Fallah, & Bowden, 2020), flight control (Gu, Valavanis, Rutherford, & Rizzo, 2019), and medical systems (Shahid, Rappon, & Berta, 2019). Taking inspiration from the brain, the third generation of neural networks, known as spiking neural networks (SNNs), has been developed to bridge the gap between machine learning and neuroscience (Maass, 1997). Unlike ANNs that process data values, SNNs use discrete events (or spikes) to encode and process data, which makes them more energy efficient and more computationally powerful than ANNs (Jang, Simeone, Gardner, & Gruning, 2019).
SNNs and ANNs are different in terms of their neuron models. ANNs typically use computation units, such as sigmoid, rectified linear unit (ReLU), or tanh and have no memory, whereas SNNs use a nondifferentiable neuron model and have memory, such as leaky integrate-and-fire (LIF). However, simulation of large-scale SNN models on classical von Neumann architectures (central processing units (CPUs)/graphics processing units (GPUs)) demands a large amount of time and power. Therefore, high-speed and low-power hardware implementation of SNNs is essential. Neuromorphic platforms, which are based on event-driven computation, provide an attractive solution to these problems. Thanks to neuromorphic hardware benefits, SNNs have become applicable to emerging domains, such as the Internet of Things and edge computing (Mead, 1990; Calimera, Macii, & Poncino, 2013).
Neuromorphic hardware can be divided into analog, digital, and mixed-mode (analog/digital) design. Although analog implementation offers small area and low power consumption, digital implementation is more flexible and less costly for processing large-scale SNN models (Indiveri et al. 2011; Seo & Seok, 2015). Field-programmable gate arrays (FPGAs) have been considered a suitable candidate for implementing digital neuromorphic platforms. Compared to ASICs, FPGAs offer shorter design and implementation time and excellent stability (Perez-Peña, Cifredo-Chacon, & Quiros-Olozabal, 2020). There have been several attempts to implement SNNs on single FPGA devices, which demonstrate promising speed-up compared to CPU implementation and lower power consumption compared to GPU implementation (Ju, Fang, Yan, R., Xu, & Tang, 2020; Zhang et al. 2020).
In this review, we introduce recent progress in spiking neural networks and neuromorphic hardware platforms suitable for their implementation. Section 2 introduces the SNNs' operation and typical spiking neuron and encoding schemes. Section 3 discusses the learning algorithms for SNNs, including unsupervised, supervised, and conversion approaches. Performance comparison of the hardware and software implementations of SNNs is given in section 5. In section 6, major challenges and future perspectives of spiking neural networks and their neuromorphic implementations are given. Section 7 concludes.
2 Spiking Neural Networks
Schematic of a biological neural network, spiking neural network, artificial neural network, and behavior of a leaky-integrate-and-fire spiking neuron.
Schematic of a biological neural network, spiking neural network, artificial neural network, and behavior of a leaky-integrate-and-fire spiking neuron.
Comparison between SNNs and ANNs.
. | Spiking Neural Network . | Artificial Neural Network . |
---|---|---|
Neuron | Spiking neuron (e.g., integrate and fire, Hodgkin-Huxley, Izhikevich) | Artificial neuron (sigmoid, ReLU, tanh) |
Information representation | Spike trains | Scalars |
Computation mode | Differential equations | Activation function |
Topology | LSM, Hopfield Network, RSNN, SCNN | RNN, CNN, LSTM, DBN, DNC |
Features | Real-time, low power, online learning, hardware friendly, biological close, fast and massively parallel data processing | Online learning, computation intensive, moderate parallelization of computations |
. | Spiking Neural Network . | Artificial Neural Network . |
---|---|---|
Neuron | Spiking neuron (e.g., integrate and fire, Hodgkin-Huxley, Izhikevich) | Artificial neuron (sigmoid, ReLU, tanh) |
Information representation | Spike trains | Scalars |
Computation mode | Differential equations | Activation function |
Topology | LSM, Hopfield Network, RSNN, SCNN | RNN, CNN, LSTM, DBN, DNC |
Features | Real-time, low power, online learning, hardware friendly, biological close, fast and massively parallel data processing | Online learning, computation intensive, moderate parallelization of computations |
2.1 Spiking Neuron Model
A spiking neuron has a similar structure to that of an ANN neuron but shows different behavior. Over time, many different neuron models have developed in the literature, such as Hodgkin-Huxley (HH), Izhikevich, leaky integrate-and-fire (LIF), and spike response models. These models differ not only on which biological characteristics of real neurons they can reproduce but also based on their computational complexity. In this section, we review four popular and representative neuron models that are widely used in the literature in terms of their biological plausibility, the neuronal properties or behaviors that can be exhibited by each model and computational efficiency, and the number of floating-point operations needed to accomplish 1 millisecond (ms) of model simulation.
2.1.1 Hodgkin-Huxley Model
The HH model, the most biologically plausible spiking neuron model, accurately capture the dynamics of many real neurons (Gerstner & Kistler 2002). However, it is too computationally expensive due to the feedback loop initiated and the differential equations for , , and to be calculated continuously. Moreover, the Hodgkin-Huxley model requires about 1200 floating-point computations (FLOPS) per 1 ms of simulation (Paugam-Moisy & Bohte, 2012). Therefore, this model is less suitable for computational intelligence applications, such as large-scale neural network simulations.
2.1.2 Izhikevich Model
Izhikevich is a 2D spiking neural model that offers a good trade-off between biological plausibility and computational efficiency. It can produce various spiking dynamics and requires 13 FLOPS per 1 ms of simulation (Paugam-Moisy & Bohte, 2012). Izhikevich is a suitable model for simulation or implementation of spiking neural networks, such as hippocampus simulation and engineering problems.
2.1.3 Integrate-and-Fire Model
There are also more complex types of IF model such as exponential integrate-and-fire, quadratic integrate-and-fire, and adaptive exponential integrate-and-fire (Borst & Theunissen, 1999).
2.1.4 Spike Response Model
The 1D spike response model is simpler than other models on the level of the spike generation mechanism. It offers low computational cost, as it requires 50 computations (FLOPS) per 1 ms simulation. However, it provides poor biological plausibility compared with the Hodgkin and Huxley model (Paugam-Moisy, 2006). This model is computationally complex when used in digital systems. However, the equations that define it can be modeled by analog circuits since the postsynaptic potential function can be seen as charging and discharging RC circuits (Iakymchuk, Rosado-Muñoz, Guerrero-Martínez, Bataller-Mompeán, & Francés-Víllora, 2015).
3 Information Coding
Neural coding is still a high-impact research domain for both neuroscientists and computational artificial intelligence researchers (Borst & Theunissen, 1999) Neurons use spikes to communicate with each other in SNN architectures. Therefore, frame-based images and feature vectors need to be encoded to spike trains, a process called an encoding scheme. This scheme has a significant influence on the performance of the network. Choosing the optimal coding approaches is related to the choice of the neuron model, application target, and hardware constraints (Thiele, 2019). Rate encoding and temporal encoding are the two main encoding schemes (Kiselev, 2016).
Spike-based information coding strategies, rate coding, latency coding, rank coding, phase coding, and population coding. are labels of neurons; is the relative timing of spikes; and the numbers in the circles shows the order of spike arrival.
Spike-based information coding strategies, rate coding, latency coding, rank coding, phase coding, and population coding. are labels of neurons; is the relative timing of spikes; and the numbers in the circles shows the order of spike arrival.
4 Algorithms for SNNs
Learning in a spiking neural network is an arduous task. Backpropagation-based gradient descent learning is a very successful method in traditional artificial neural networks; however, training SNNs is difficult due to the nondifferentiable nature of spike events. As a result, considerable research effort has been mobilized to develop suitable learning algorithms that can be applied to multilayer SNNs, which are thus interesting for deep learning. There are four main strategies for training SNNs: unsupervised learning, supervised learning, conversion from trained ANNs, and evolutionary algorithm. These strategies are briefly reviewed in the following subsections.
4.1 Unsupervised Learning
Unsupervised learning is the process of learning without preexisting labels. Unsupervised learning of SNNs is based on the Hebbian rule that consists of adapting the network's synaptic connections to the data received by the neurons (Caporale & Dan, 2008). The spike-timing-dependent plasticity (STDP) algorithm is an implementation of Hebb's rule. STDP is a phenomenon observed in the brain and describes how the efficacy of a synapse changes as a function of the relative timing of presynaptic and postsynaptic spikes. A presynaptic spike in this context is the spike arriving at the synapse of the neuron. The postsynaptic spike is the spike emitted by the neuron itself (Markram, Gerstner, & Sjöström, 2011). The mechanism of STDP is based on the concept that the synapses that are likely to have contributed to the firing of the neuron should be reinforced. Similarly, the synapses that did not contribute or contributed in a negative way should be weakened (Dan & Poo, 2006).
In recent years, significant research efforts have been focused on training SNNs using STDP. Qu, Zhao, Wang, and Wang (2020) developed two novel hardware-friendly methods, lateral inhibition and homeostasis, which reduce the number of inhibitory connections that lead to lowering the hardware overhead. An STDP rule was used to adapt the synapse weight between input and the learning layer and achieved 92% recognition accuracy on the MNIST data set. Xu et al. (2020) proposed a hybrid learning framework, named deep CovDenseSNN, that combines the biological plausibility of SNNs and feature extraction of CNNs. An unsupervised STDP learning rule was used to update the parameters of their proposed deep CovDenseSNN model, which is suitable for neuromorphic hardware implementation. Supervised learning and reinforcement learning are other types of STDP methods for learning (Mozafari, Ganjtabesh, Nowzari-Dalini, Thorpe, & Masquelier, 2018; Mozafari, Kheradpisheh, Masquelier, Nowzari-Dalini, & Ganjtabesh, 2018).
Lee, Panda, Srinivasan, and Roy (2018) proposed a semisupervised strategy to train a convolutional SNN with multiple hidden layers. The training scheme had two steps: initializing the weights of the network by unsupervised learning (namely, SSTDP), and then employing the supervised gradient descent backpropagation (BP) algorithm to fine-tune the synaptic weight. Pretraining approaches led to better generalization, faster training time, and 99.28% accuracy on the MNIST database. Tavanaei, Kirby, and Maida (2018) developed a novel method to train multilayer spiking convolutional neural networks (SCNNs). The training process includes unsupervised (a novel STDP learning scheme for feature extraction) and supervised (a supervised learning scheme to train spiking CNNs (ConvNets)) components.
4.2 Supervised Learning
One of the first algorithms to train SNNs using backpropagation errors is SpikeProp, proposed by Bohte, Kok, and La Poutre (2002). This model is applied successfully to classification problems using a three-layer architecture. A later advanced version of SpikeProp called spike train SpikeProp (ST-SpikeProp) used the weight updating rule of the output layer to train the single-layer SNNs (Xu, Zeng, Han, & Yang, 2013). In order to solve the nondifferentiable problem of SNNs, Wu et al. (2018). Proposed the spatiotemporal backpropagation (STBP) algorithm, which combines the timing-dependent temporal domain and the layer-by-layer spatial domain. Supervised learning using temporal coding has shown a significant decrease in the energy consumption of SNNs. Mostafa (2017) developed a direct training approach via backpropagation error with the temporal coding scheme. His network has no convolutional layers, and the preprocessing method is not general. Zhou, Chen, Ye, and Li (2019) improved on Mostafa's work by incorporating convolutional layers into the SNN, developing a new kernel operation, and proposing a new way to preprocess the input data. Their SCNN achieved high recognition accuracy with less trainable parameters. Stromatias, Soto, Serrano-Gotarredona, and Linares-Barranco (2017) presented a supervised method for training a classifier by using the stochastic gradient descent (SGD) algorithm and then converting it to an SNN. In other work, Zheng and Mazumder (2018a) proposed backpropagation-based learning for training SNNs. Their proposed learning algorithm is suitable for implementation in neuromorphic hardware.
4.3 Conversion from Trained ANN
In the third technique, an offline-trained ANN is converted to an SNN so that the transformed network can take advantage of a well-established, fully trained ANN model. This approach is often called “spike transcoding” or “spike conversion.” Converting an ANN to SNN offers several benefits. First, a simulation of the exact spike dynamics in a large network can be computationally expensive, particularly if high firing rates and precise spike times are required. Therefore, this approach allows applying SNNs to complex benchmark tasks that require large networks, such as ImageNet or CIFAR-10, and the accuracy loss compared to their formal ANNs is small (Sengupta, Ye, Wang, Liu, & Roy, 2018; Hu, Tang, & Pan, 2018). Second, we can leverage highly efficient training techniques developed for ANNs and many state-of-the-art deep networks for classification tasks for conversion to SNNs. Moreover, the optimization process can be performed on ANNs. This permits the use of state-of-the-art optimization procedures and GPUs for training (Diehl et al., 2015). The main disadvantage is that the conversion technique fails to provide on-chip learning capability. Furthermore, some particularities of SNNs, which do not exist in the corresponding ANNs, cannot be considered during training. For this reason, the inference performance of the SNNs is typically lower than that of the original ANNs (Pfeiffer & Pfeil, 2018).
Significant research has been carried out to convert an ANN to an SNN with successful performance on the MNIST data set. Diehl et al. (2015) proposed a technique for converting an ANN into an SNN that has the minimum performance loss in the conversion process, and a recognition rate of 98.64% was achieved on the MNIST database. In another work, Rueckauer, Lungu, Hu, Pfeiffer, and Liu (2017) converted continuous-valued deep CNN to accurate spiking equivalent. This network, which includes common operations such as softmax, max-pooling, batch normalization, biases, and inception modules, demonstrates a recognition rate of 99.44% on the MNIST data set. Xu, Tang, Xing, and Li (2017) proposed a conversion method that is suitable for mapping on neuromorphic hardware. They presented a threshold rescaling method to reduce the loss and achieved a maximum accuracy of 99.17% on the MNIST data set. Xu et al. (2020) established an efficient and hardware-friendly conversion rule to convert CNNs into spiking CNNs. They proposed an “-scaling” weight mapping method that achieves high accuracy and low-latency classification on the MNIST data set. Wang, Xu, Yan, and Tang (2020) proposed a weights-thresholds balance conversion technique that needs fewer memory resources and achieves high recognition accuracy on the MNIST data set. In contrast to the existing conversion techniques, which focus on the approximation between the artificial neurons' activation values and the spiking neurons' firing rates, they focused on the relationship between weights and thresholds of spiking neurons during the conversion process.
4.4 Evolutionary Spiking Neural Networks
Evolutionary algorithms (EAs) are population-based metaheuristics. Historically, their design was motivated by observations about natural evolution in biological populations. Such algorithms can be used to directly optimize the network topology and model hyperparameters or optimize synaptic weights and delays (Saleh, Hameed, Najib, & Salleh, 2014; Schaffer, 2015). Currently, evolutionary algorithms such as differential evolution (DE), grammatical evolution (GE), harmony search algorithm (HSA), and particle swarm optimization (PSO) are used to learn the synaptic weights of SNNs. Vazquez (2010), López-Vázquez et al. (2019), and Yusuf et al. (2017) have shown how the synaptic weights of a spiking neuron, including integrate-and-fire, Izhikevich, and spike response model (SRM) models can be trained by using algorithms such as DE, GE, and HSA to perform classification tasks. Vazquez and Garro (2011) applied the PSO algorithm to train the synaptic weights of a spiking neuron in linear and nonlinear classification problems. They discovered that input patterns of the same class produce equal firing rates. The parallel differential evolution approach was introduced by Pavlidis, Tasoulis, Plagianakos, Nikiforidis, and Vrahatis (2005) for training supervised feedforward SNNs. Their approach is tested on exclusive-OR, which does not represent its benefits. Evolutionary algorithms can be an alternative to exhaustive search. However, they are very time-consuming, notably because the fitness function is computationally expensive (Gavrilov & Panchenko, 2016).
Table 2 shows the models for developing SNNs—their architectures and learning type along with their accuracy rates on the MNIST data set. This comparison provides an insight into different SNN architectures and learning mechanisms to choose the right tool for the right purpose in future investigations.
Summary of Recent SNN Learning Models and Their Accuracy on Handwritten Digits Data Set (NIST).
Reference . | Network-Type . | Encoding Method . | Structure Configuration . | Neuron Type . | Learning Type . | Learning Rule . | Training/Test Sample . | CA (%) . |
---|---|---|---|---|---|---|---|---|
Mirsadeghi et al. (2021) | SNN | Temporal coding | 784-500-10 | Linear SRM | Supervised | STiDi-BP | 60,000/10,000 | 97.4 |
Fu & Dong (2021) | Spiking CNN | Rank order coding | NA | LIF | Unsupervised | Variable threshold STD | 60,000/10,000 | 99.27 |
Qu et al. (2020) | SNN | Temporal coding | 784-400-400-10 | Nonleaky IF | Unsupervised | STDP | 60,000/10,000 | 92 |
Xu et al. (2020) | Deep CovDenseSNN | Rate coding | 6C6@28×28-12C5-24C5-P | LIF | Unsupervised | Hybrid spike-based learning. STDP | 60,000/10,000 | 91.4 |
Xu et al. (2020) | Spiking CNN | Rate coding | 5C5-2P-64C5-2P-10FC | IF | Supervised | Conversion rule | 70,000 | 99.09 |
Wang et al. (2020) | Deep SNN | Rate coding | 64C3-MP2-64C3-2MP-128FC-10 | LIF | Supervised | Weights-threshold balance conversion | 60,000/10,000 | 99.43 |
Zhou et al. (2019) | Spiking CNN | Temporal coding | NA | Nonleaky IF | Supervised | Temporal backpropagation | 60,000/10,000 | 98.50 |
Zheng and Mazumder (2018a) | SNN | Rate coding | 784-300-100-10 | LIF | Supervised | Online learning stochastic GD | 60,000/10,000 | 97.8 |
Kulkarni and Rajendran (2018) | SNN | NA | 784-8112-10 | LIF | Supervised | Normalized approximate descent | 50,000/10,000 | 98.17 |
Lee et al. (2018) | Deep CovDenseSNN | Rate coding | 2828-20C5-2P-50C5-2P-200FC-10FC | LIF | Semisupervised | STDP-based pretraining backpropagation | 60,000/10,000 | 99.28 |
Shrestha and Orchard (2018) | Deep SNN | Temporal coding | 28×28-12c5-2a-64c5-2a-10o | LIF | Supervised | Backpropagation | 60,000/10,000 | 99.36 |
Tavanaei et al. (2018) | Spiking CNN | Temporal coding | 64C5-2P-1500FC-10FC | LIF | Both | STDP rep. learning and BP-STDP | 60,000/10,000 | 98.60 |
Mostafa (2017) | SNN | Temporal coding | 784-400-400-10 | Nonleaky IF | Supervised | Temporal backpropagation | 60,000/10,000 | 97.14 |
Xu et al. (2017) | Spiking ConvNet | Rate coding | 28×28-32c5-2s-64c5-2s-1024f-10o | IF | Supervised | Conversion rule | 20,000 | 99.17 |
Stromatias et al. (2017) | Spiking CNN | Temporal coding | NA | LIF | Supervised | Stochastic GD | 60,000/10,000 | 98.42 |
Reference . | Network-Type . | Encoding Method . | Structure Configuration . | Neuron Type . | Learning Type . | Learning Rule . | Training/Test Sample . | CA (%) . |
---|---|---|---|---|---|---|---|---|
Mirsadeghi et al. (2021) | SNN | Temporal coding | 784-500-10 | Linear SRM | Supervised | STiDi-BP | 60,000/10,000 | 97.4 |
Fu & Dong (2021) | Spiking CNN | Rank order coding | NA | LIF | Unsupervised | Variable threshold STD | 60,000/10,000 | 99.27 |
Qu et al. (2020) | SNN | Temporal coding | 784-400-400-10 | Nonleaky IF | Unsupervised | STDP | 60,000/10,000 | 92 |
Xu et al. (2020) | Deep CovDenseSNN | Rate coding | 6C6@28×28-12C5-24C5-P | LIF | Unsupervised | Hybrid spike-based learning. STDP | 60,000/10,000 | 91.4 |
Xu et al. (2020) | Spiking CNN | Rate coding | 5C5-2P-64C5-2P-10FC | IF | Supervised | Conversion rule | 70,000 | 99.09 |
Wang et al. (2020) | Deep SNN | Rate coding | 64C3-MP2-64C3-2MP-128FC-10 | LIF | Supervised | Weights-threshold balance conversion | 60,000/10,000 | 99.43 |
Zhou et al. (2019) | Spiking CNN | Temporal coding | NA | Nonleaky IF | Supervised | Temporal backpropagation | 60,000/10,000 | 98.50 |
Zheng and Mazumder (2018a) | SNN | Rate coding | 784-300-100-10 | LIF | Supervised | Online learning stochastic GD | 60,000/10,000 | 97.8 |
Kulkarni and Rajendran (2018) | SNN | NA | 784-8112-10 | LIF | Supervised | Normalized approximate descent | 50,000/10,000 | 98.17 |
Lee et al. (2018) | Deep CovDenseSNN | Rate coding | 2828-20C5-2P-50C5-2P-200FC-10FC | LIF | Semisupervised | STDP-based pretraining backpropagation | 60,000/10,000 | 99.28 |
Shrestha and Orchard (2018) | Deep SNN | Temporal coding | 28×28-12c5-2a-64c5-2a-10o | LIF | Supervised | Backpropagation | 60,000/10,000 | 99.36 |
Tavanaei et al. (2018) | Spiking CNN | Temporal coding | 64C5-2P-1500FC-10FC | LIF | Both | STDP rep. learning and BP-STDP | 60,000/10,000 | 98.60 |
Mostafa (2017) | SNN | Temporal coding | 784-400-400-10 | Nonleaky IF | Supervised | Temporal backpropagation | 60,000/10,000 | 97.14 |
Xu et al. (2017) | Spiking ConvNet | Rate coding | 28×28-32c5-2s-64c5-2s-1024f-10o | IF | Supervised | Conversion rule | 20,000 | 99.17 |
Stromatias et al. (2017) | Spiking CNN | Temporal coding | NA | LIF | Supervised | Stochastic GD | 60,000/10,000 | 98.42 |
The new concepts and architectures are still frequently tested on MNIST. However, we argue that the MNIST data set does not include temporal information and does not provide spike events generated from sensors. Compared to a static data set, a dynamic data set contains richer temporal features and therefore is more suitable to exploit an SNN's potential ability. The event-based benchmark data sets include N-MNIST (Orchard, Jayawant, Cohen, & Thakor, 2015), CIFAR10-DVS (Hongmin Li, Liu, Ji, Li, & Shi, 2017), N-CARS (Sironi, Brambilla, Bourdis, Lagorce, & Benosman, 2018), DVS-Gesture (Amir et al., 2017), and SHD (Cramer, Stradmann, Schemmel, & Zenke, 2020). Table 3 shows the models for developing SNNs—their architectures and learning type along with their accuracy rates on the neuromorphic data sets.
Summary of Recent SNN Learning Models and Their Accuracy on Event-Based Data Set.
Reference . | Network Type . | Learning Rule and Structure Configuration . | Data Set . | CA % . |
---|---|---|---|---|
Kugele et al. (2020) | SNN | ANN-to-SNN conversion | N-MNIST | 95.54 |
CIFAR-DVS | 66.61 | |||
DvsGesture | 96.97 | |||
N-Cars | 94.07 | |||
Wu et al. (2018) | Spiking MLP | Spatiotemporal backpropagation (STBP) 34 34 2-800-10 | N_MNIST | 98.78 |
Wu et al. (2019) | SNN | Spatiotemporal backpropagation (STBP) 128C3(Encoding)-128C3-AP2-384C3-384C3-AP2-1024FC-512FC-Voting | N-MNIST | 99.53 |
CIFAR-DVS | 60.5 | |||
Zheng et al. (2020) | ResNet17 SNN | Threshold-dependent batch normalization method based on spatiotemporal backpropagation (STBP-tdBN) | CIFAR-DVS | 67.80 |
DvsGesture | 96.87 | |||
Lee et al. (2016) | SNN | Supervised backpropagation (34 34 2)-800-10 | N-MNIST | 98.66 |
Yao et al. (2021) | Spiking CNN | Temporal-wise attention SNN (TA-SNN) | DvsGesture | 98.61 |
(1) Input-MP4-64C3-128C3-AP2-128C3-AP2-256FC-11 | CIFAR-DVS | 72 | ||
(2) Input-32C3-AP2-64C3-AP2-128C3-AP2-256C3-AP2-512C3-AP4-256FC-10 | SHD | 91.08 | ||
(3) Input-128FC-128FC-20 | ||||
Neil and Liu (2016) | Spiking CNN | ANN-to-SNN conversion | N-MNIST | 95.72 |
Reference . | Network Type . | Learning Rule and Structure Configuration . | Data Set . | CA % . |
---|---|---|---|---|
Kugele et al. (2020) | SNN | ANN-to-SNN conversion | N-MNIST | 95.54 |
CIFAR-DVS | 66.61 | |||
DvsGesture | 96.97 | |||
N-Cars | 94.07 | |||
Wu et al. (2018) | Spiking MLP | Spatiotemporal backpropagation (STBP) 34 34 2-800-10 | N_MNIST | 98.78 |
Wu et al. (2019) | SNN | Spatiotemporal backpropagation (STBP) 128C3(Encoding)-128C3-AP2-384C3-384C3-AP2-1024FC-512FC-Voting | N-MNIST | 99.53 |
CIFAR-DVS | 60.5 | |||
Zheng et al. (2020) | ResNet17 SNN | Threshold-dependent batch normalization method based on spatiotemporal backpropagation (STBP-tdBN) | CIFAR-DVS | 67.80 |
DvsGesture | 96.87 | |||
Lee et al. (2016) | SNN | Supervised backpropagation (34 34 2)-800-10 | N-MNIST | 98.66 |
Yao et al. (2021) | Spiking CNN | Temporal-wise attention SNN (TA-SNN) | DvsGesture | 98.61 |
(1) Input-MP4-64C3-128C3-AP2-128C3-AP2-256FC-11 | CIFAR-DVS | 72 | ||
(2) Input-32C3-AP2-64C3-AP2-128C3-AP2-256C3-AP2-512C3-AP4-256FC-10 | SHD | 91.08 | ||
(3) Input-128FC-128FC-20 | ||||
Neil and Liu (2016) | Spiking CNN | ANN-to-SNN conversion | N-MNIST | 95.72 |
5 Available Hardware and Software/Frameworks
Different methods can be used for neural network implementation. Computational cost, speed, and configurability are the main concerns for the implementations. Although CPU-based simulations offer a relatively high-speed execution, they are designed to be used for general-purpose and everyday applications. They also offer a serial implementation that limits the number of neurons that can be implemented at the same time. Hardware implementations instead can provide a platform for parallel implementations. Although analog implementation is relatively efficient, it suffers from an expensive and long design and implementation process. FPGA instead offers a configurable platform that offers parallel processing, which makes it a suitable candidate for SNN implementations.
5.1 Available Software
There are many different SNN simulators—for example, BindsNET, Nengo, NeMo, Brian2GeNN, Nest, and CARLsim. Existing simulators have different levels of biological models, computational speed, and support for hardware platforms. They are classified into three main groups depending on how the neural model dynamic evaluation is computed: event driven (asynchronous), where the membrane potential is modified only when a spike arrives; clock-driven (synchronous), where the neural state is updated at every tick of a clock; and hybrid strategies (asynchronous and synchronous) (Rudolph-Lilith, Dubois, & Destexhe, 2012).
Event-driven simulators are not as widely used as clock-driven simulators due to their implementation complexity. Moreover, they are difficult to parallelize due to their sequential nature. Their main advantage is their higher operation speed because they do not calculate small update steps for a neuron. Another benefit of event-driven simulators is that the timing of spikes can be represented with high precision. These simulators are more suitable for neural network layers with low and sparse activity (Naveros, Garrido, Carrillo, Ros, & Luque, 2017).
The majority of SNN simulators are clock-driven. Because of high parallelism, clock-driven simulators take full advantage of parallel computing resources in CPU and GPU platforms. Their platforms perform better for small and medium-size groups of neurons with a low to medium mathematical complexity, whereas GPU clock-driven platforms perform better for large-size groups of neurons with high mathematical complexity. The main advantage of clock-driven simulators is that they are suitable for simulating large networks when a large number of events is triggered. Many of these simulators are built on top of the existing deep learning frameworks because they are structurally similar to simulating an ANN. Their main disadvantages are that spike timings are aligned to ticks of the clock and threshold conditions are checked only at the ticks of the clock (Brette et al., 2007). Selecting the most appropriate technique requires a trade-off among three elements: (1) neural network architecture (e.g., number of neurons, neural model complexity, number of input and output synapses, mean firing rates), (2) hardware resources (number of CPU and GPU cores, RAM size), and (3) simulation requirements and targets.
Among the SNN simulators that have been reported in the literature are BindsNET (Hazan et al., 2018), Nengo (Bekolay et al., 2014), NeMo (Fidjeland, Roesch, Shanahan, & Luk, 2009), GeNN (Yavuz et al., 2016), Brain 2 (Stimberg, Brette, & Goodman, 2019), Brian2GeNN (Stimberg, Goodman, & Nowotny, 2020), NEST (Gewaltig & Diesmann, 2007), CARLsim (Beyeler, Carlson, Chou, Dutt, & Krichmar, 2015; Chou et al., 2018), NeuCube (Kasabov, 2014), PyNN (Davison, 2009), ANNarchy (Vitay, Dinkelbach, & Hamker, 2015), and NEURON (Hines & Carnevale 1997). There are some major criteria for choosing an SNN simulator. It should be open access; easy to debug and run; and support various hardware such as ASIC and FPGA to execute the simulation and support the level of biological complexity. We describe the main features of prominent existing SNNs simulators in Table 3.
BindsNET is an open-source Python package for rapid building and simulation of SNNs, which developed on top of the PyTorch deep learning library for its matrix computation. BindsNET allows researchers to test the software prototypes on CPUs or GPUs and then deploy the model to dedicated hardware (Hazan et al., 2018).
Nengo is a neural simulator based on the neural engineering framework for simulating both large-scale spiking and nonspiking neural models. It is written in Python and supports the TensorFlow back end. This Python library allows users to define neuron types, learning rules, and optimization methods (Bekolay et al., 2014).
NeMo, a C class library for simulating SNNs, can simulate tens of thousands of neurons on a single workstation. It has bindings for Matlab and Python and one of the supported back ends for the PyNN simulator interface (Fidjeland et al. 2009).
GeNN is an open-source library for accelerating SNN simulations on CPUs or GPUs via code generation technology (Yavuz, Turner, & Nowotny, 2016).
Brain is a popular open-source simulator for SNNs written in Python. It is highly flexible, easily extensible, and commonly used in computational neuroscience. Version 2 of Brain allows scientists to efficiently simulate SNN models (Stimberg et al., 2019). In a newly developed software package, Brian2GeNN, the GPU-enhanced neural network simulator (GeNN) can be used to accelerate simulation in the Brain simulator (Stimberg et al., 2020).
Another popular and open-source simulator for SNNs is NEST, focusing on the dynamics, size, and structure of neural network. It is suitable for large networks of spiking neurons (Gewaltig et al. 2007). CARLsim is a user-friendly and GPU-accelerated SNN library written in C that supports CPU-GPU co-execution (Beyeler et al., 2015). Version 4 of CARLsim has been improved to simulate large-scale SNN models in with real-time constraints (Chou et al. 2018). Table 4,0 shows the features of the most SNNs simulation software.
Features of the Best-Known SNN Software.
Simulator . | Open Source . | GPU . | Simulation . | Programming Language . | Features . |
---|---|---|---|---|---|
BindsNET | Yes | Yes | Clock-driven | C (Python package) | • BindsNET can be connected to different hardware (e.g., FPGA, ASIC), to execute the simulations. |
• Provide an interface with the OpenAI gym library for training SNNs on reinforcement learning environment. | |||||
• Suitable for application in the domain of machine learning. | |||||
• A Torchvision data set has integrated into the library for computer vision tasks. | |||||
Nengo | Yes | Yes | Clock-driven | C (Python package) | • Focuses on high-level behaviors of spiking neural networks. |
• Supports multithreaded execution on CPUs. | |||||
• Has simulation back ends, such as NengoFPGA, Nengo Loihi, Nengo SpiNNaker, and Nengo OpenCL. | |||||
• Nengo's libraries are designed to help with deep learning, adaptive control, and cognitive modeling. | |||||
NeMo | Yes | Yes | Clock-driven | C | • Supports multithreaded execution on CPU |
• Good candidate for applications in machine learning. | |||||
• Can to simulate different neuron models and hardware configurations. | |||||
• Can run on CUDA-enabled GPUs. | |||||
Brian2GeNN | Yes | Yes | Clock-driven | C (Python package) | • Supports all major operating system. |
• Takes advantage of both GeNN and Brain2, which allow users to run their Brian 2 scripts on NVIDIA GPU accelerators without any further necessary programming. | |||||
• Using the Brian2GeNN library enhances performance by tens to hundreds of times faster |
Simulator . | Open Source . | GPU . | Simulation . | Programming Language . | Features . |
---|---|---|---|---|---|
BindsNET | Yes | Yes | Clock-driven | C (Python package) | • BindsNET can be connected to different hardware (e.g., FPGA, ASIC), to execute the simulations. |
• Provide an interface with the OpenAI gym library for training SNNs on reinforcement learning environment. | |||||
• Suitable for application in the domain of machine learning. | |||||
• A Torchvision data set has integrated into the library for computer vision tasks. | |||||
Nengo | Yes | Yes | Clock-driven | C (Python package) | • Focuses on high-level behaviors of spiking neural networks. |
• Supports multithreaded execution on CPUs. | |||||
• Has simulation back ends, such as NengoFPGA, Nengo Loihi, Nengo SpiNNaker, and Nengo OpenCL. | |||||
• Nengo's libraries are designed to help with deep learning, adaptive control, and cognitive modeling. | |||||
NeMo | Yes | Yes | Clock-driven | C | • Supports multithreaded execution on CPU |
• Good candidate for applications in machine learning. | |||||
• Can to simulate different neuron models and hardware configurations. | |||||
• Can run on CUDA-enabled GPUs. | |||||
Brian2GeNN | Yes | Yes | Clock-driven | C (Python package) | • Supports all major operating system. |
• Takes advantage of both GeNN and Brain2, which allow users to run their Brian 2 scripts on NVIDIA GPU accelerators without any further necessary programming. | |||||
• Using the Brian2GeNN library enhances performance by tens to hundreds of times faster |
Continued.
Simulator . | Open Source . | GPU . | Simulation . | Programming Language . | Features . |
---|---|---|---|---|---|
NEST | Yes | No | Hybrid | C (Python package) | • Fast and memory efficient with minimal dependencies. |
• Best suited for models that focus on the dynamics, size, and structure of neural systems. | |||||
• Can take advantage of multicore computer in a local network to increase the available memory or to speedup the simulation. | |||||
CARLsim | Yes | Yes | Clock-driven | C (Python package) | • Balances between flexibility and performance by providing optimized CUDA/C implementations of a large number of biologically plausible model features. |
• Provides an easy-to-use programming interface | |||||
• Provides an integrated automatic parameter tuning interface for spiking neural networks. | |||||
• Allows users to add a new feature with minimal effort. | |||||
• Finds open parameters of an SNN that best fit a given response behavior. |
Simulator . | Open Source . | GPU . | Simulation . | Programming Language . | Features . |
---|---|---|---|---|---|
NEST | Yes | No | Hybrid | C (Python package) | • Fast and memory efficient with minimal dependencies. |
• Best suited for models that focus on the dynamics, size, and structure of neural systems. | |||||
• Can take advantage of multicore computer in a local network to increase the available memory or to speedup the simulation. | |||||
CARLsim | Yes | Yes | Clock-driven | C (Python package) | • Balances between flexibility and performance by providing optimized CUDA/C implementations of a large number of biologically plausible model features. |
• Provides an easy-to-use programming interface | |||||
• Provides an integrated automatic parameter tuning interface for spiking neural networks. | |||||
• Allows users to add a new feature with minimal effort. | |||||
• Finds open parameters of an SNN that best fit a given response behavior. |
5.2 Available Hardware
Spiking neuromorphic hardware can be subdivided into analog and digital or mixed-mode (analog/digital) design. Analog hardware uses physical processes to model certain computational functions of artificial neurons. The advantage of this approach is that operations that might be costly to implement as an explicit mathematical operation can be realized very efficiently by the natural dynamics of the system (Neil & Liu, 2016). Additionally, real-valued physical variables could have almost infinite precision. Analog hardware implementations differ on the degree to which analog elements are used. Many implementations perform only the computation in the neuron with analog elements, keeping the communication of spike signals digital (Camuñas, Linares-Barranco, & Serrano-Gotarredona, 2019).
Digital hardware represents all variables of the neurons by bits, just like a classical computer. This means that the precision of variables depends on the number of bits used to represent the variables. This precision also strongly influences the energy consumption of the basic operations and the memory requirements for variable storage. The great advantage of digital designs compared to analog hardware is that the precision of variables is controllable and guaranteed. Additionally, digital hardware can be designed with established state-of-the-art techniques for chip design and manufacturing. The digital solutions could be implemented on either FPGAs or application-specific integrated circuits (ASICs) (Schuman et al., 2017). Alternatively, due to the high production costs of ASICs, other research groups have focused on implementing SNNs on FPGAs.
5.2.1 Learning with Neuromorphic Hardware
Learning mechanisms are crucial for the ability of neuromorphic systems to adapt to specific applications. Various types of learning can be performed depending on the number of hyperparameters included in the learning, and the learning time can be greatly varied. When such learning is performed in a neuromorphic chip, the learning is referred to as on-chip training (Lee, Lee, Kim, Lee, & Seo, 2020). In order to perform on-chip training, the neuromorphic chip should have almost all of the functions required for learning (Walter, Röhrbein, & Knoll, 2015). Off-chip training is a method of implementing learning outside a neuromorphic chip using, for example, software. After external learning is completed, the weights are postprocessed according to the neuromorphic system, or the neuromorphic system is fabricated using the postprocessed weights.
Whether to implement on-chip or off-chip training depends on the application under consideration. If the objective is to design a general accelerator for machine learning, obviously the chip should allow on-chip training (Burr et al., 2015). If the purpose is to perform a unique machine learning task on embedded low-power hardware, off-chip learning, which potentially is power consuming, can be realized only once, after which the resulting network is programmed on-chip. At that point, one could argue that in some cases, the system should need to adapt to its sensing environment while operating, which is referred to as online learning. One solution is to enable off-chip training between operation times and update or fine-tune the SNNs during inactive or loading time. However, this approach still brings some drawbacks; for example, it requires adding a memory that would store the input data acquired during operation. In addition, online learning is still being researched because machine learning currently has the major drawback of forgetting, which means that a trained network cannot learn a new task without losing accuracy on its previously learned task (Zheng & Mazumder, 2018b).
For many years, STDP has been the algorithm of choice for implementing machine learning tasks in spiking neuromorphic systems (Diehl & Cook, 2014). It is popular in the neuromorphic community for several reasons. First, the field of neuromorphic computing has traditionally been inspired by biology. This is the reason that early approaches for learning in neuromorphic hardware have been inspired by mechanisms observed in the brain. In addition, STDP is straightforward to implement in analog neuromorphic hardware. Its time dependence is often modeled by an exponential decay, which can simply be calculated by analog electronic elements. Finally, if we want to apply supervised learning, these algorithms require either complex neurons and synaptic models or floating-point values communication of gradients between layers, and thus between neurocores, which makes their hardware implementation impractical. Moreover, if weight update is performed online (i.e., during inference), feedforward operation must be paused for learning, which adds an operational delay to the system.
5.2.2 Large-Scale Neuromorphic Hardware
Appraisal of large-scale neural networks requires dedicated hardware to be highly configurable. The well-known neuromorphic architectures TrueNorth (Merolla et al., 2014), Neurogrid (Benjamin et al., 2014), BrainScaleS (Schemmel et al., 2010), Loihi (Davies et al., 2018), and SpiNNaker (Furber, Galluppi, Temple, & Plana, 2014) pursue various characteristics to emulate networks of spiking neurons. (Note that this review addresses the well-known fully digital and mixed digital-analog neuromorphic hardware.)
The IBM TrueNorth chip is a neuromorphic platform implemented in digital electronics. This chip is designed for large-scale networks evaluation and closer to human brain structure rather than von Neumann architecture used in conventional computers. A single TrueNorth chip contains 5.4 billion transistors and 4096 neurosynaptic cores. Each core includes 12.75 KB of local static random-access memory (SRAM), 256 neurons, 265 axons, and a 265 265 synapse crossbar. This chip can simulate up to 1 million neurons and 265 million synapses. A TrueNorth chip is programmable via Corelet programming language (Merolla et al., 2014).
Neurogrid is a mixed digital-analog neuromorphic device that targets real-time simulation of biological brains. The Neurogrid board is composed of 16 complementary metal-oxide-semiconductor (CMOS) NeuroCore chips, each of which has 256 256 analog neurons fabricated in a 180 nm CMOS technology. This board is able to perform real-time biological simulations of the brain with billions of synaptic connections and 1 million neurons (Benjamin et al., 2014).
BrainScaleS is a mixed-mode analog/digital neuromorphic hardware system, based on physical emulations of neuron, synapse, and plasticity models, that targets the emulation of brain-size neural networks. The system is composed of 8-inch silicon wafers capable of simulation up to 50 106 plastic synapses and 200,000 neurons. Adaptive exponential IF neuron models and synapses in an analog network core structure have been implemented in the BrainScaleS system. The communication units in the system are digital, while the pro-cessing units are analog circuits (Schemmel et al., 2010).
A fully digital neuromorphic research chip known as Loihi has been designed by Intel labs to implement SNNs. The chip is fabricated in Intel's 14 nm process technology and contains 128 cores, along with three managing Lakemont cores. The Loihi chip can implement up to 130,000 neurons and 130 million synapses. Moreover, a learning engine embedded in each core enables on-chip learning, with various learning rules, which makes Loihi more flexible for supervisor/nonsupervisor and reinforcing models. It can process information faster and more efficiently than conventional processors up to 1000 and 10,000, respectively, which makes it an ideal candidate for solving specific types of optimization problems (Davies et al., 2018).
SpiNNaker is a large, digital neuromorphic system designed to simulate large-scale neural computational models in real time. The SpiNNaker board consists of 48 chips, each one containing 18 ARM microprocessors and a network on chip (NoC). Each core contains an ARM968 and a direct memory Access (DMA) controller to implement almost 1000 spiking neurons in real time. One of the advantages of the SpiNNaker is using an asynchronous scheme of communication. The PyNN interface makes the SpiNNaker board programmable. PyNN is a Python library that provides various spiking neuron model and synaptic plasticity rules. This neuromorphic platform has been used in neuroscience applications, such as the simulation of the visual cortex or the cerebellum (Furber, 2016).
The main features of these neuromorphic systems are shown in Table 5. Note that for the execution of the network, only the learning approaches that are implemented on-chip to run online are reviewed in this table.
Summary of Neuromorphic Systems Implementations.
Platform . | Electronics . | Technology (nm) . | Chip Area (mm) . | On-Chip Learning . | Neuron Model . | Neuron Number (chip) . | Synapse Model . | Synapse Number (chip) . | Online Learning . | Power . |
---|---|---|---|---|---|---|---|---|---|---|
BrainScaleS | Analog/digital | ASIC-CMOS 180 | 50 | Yes (STDP) | Adaptive exponential IF | 512 | Spiking 4-bit digital | 100 K | Yes | 2 kW per module (peak) |
TrueNorth | Digital | ASIC-CMOS 28 | 430 | No | LIF | 1 million | Binary, 4 modulators | 256 M | No | 65 mW (per chip) |
SpiNNaker | Digital | ASIC-CMOS 130 nm | 102 | Yes (synaptic plasticity rules) | LIF, IZH, HH | 16,000 | Programmable | 16 M | Yes | 1 W (per chip) |
Neurogrid | Analog/digital | ASIC-CMOS 180 | 168 | No | Adaptive quadratic IF | 65,000 | Shared dendrite | 100 M | Yes | 2.7 W |
Loihi | Digital | ASIC-CMOS 14 nm | 60 | Yes (with plasticity rule) | Adaptive LIF | 131,000 | N/A | 126 M | Yes | .45 W |
Platform . | Electronics . | Technology (nm) . | Chip Area (mm) . | On-Chip Learning . | Neuron Model . | Neuron Number (chip) . | Synapse Model . | Synapse Number (chip) . | Online Learning . | Power . |
---|---|---|---|---|---|---|---|---|---|---|
BrainScaleS | Analog/digital | ASIC-CMOS 180 | 50 | Yes (STDP) | Adaptive exponential IF | 512 | Spiking 4-bit digital | 100 K | Yes | 2 kW per module (peak) |
TrueNorth | Digital | ASIC-CMOS 28 | 430 | No | LIF | 1 million | Binary, 4 modulators | 256 M | No | 65 mW (per chip) |
SpiNNaker | Digital | ASIC-CMOS 130 nm | 102 | Yes (synaptic plasticity rules) | LIF, IZH, HH | 16,000 | Programmable | 16 M | Yes | 1 W (per chip) |
Neurogrid | Analog/digital | ASIC-CMOS 180 | 168 | No | Adaptive quadratic IF | 65,000 | Shared dendrite | 100 M | Yes | 2.7 W |
Loihi | Digital | ASIC-CMOS 14 nm | 60 | Yes (with plasticity rule) | Adaptive LIF | 131,000 | N/A | 126 M | Yes | .45 W |
5.2.3 FPGA-Based Implementation of SNN
SNN algorithms have parallel and distributed nature. Today's computer architecture and software are not suitable for SNN execution. An alternative approach is to accelerate SNN applications through dedicated hardware. Neuromorphic hardware is designed to minimize energy and cost while keeping maximum accuracy. It presents promising speed-ups compared with software programs running on CPUs, with lower power consumption than GPU.
Several neuromorphic accelerators have been used for implementing SNNs. However, they encounter some limitations, such as maximum fan-in/fan-out of a neuron and synaptic precision, and are not suited for embedded systems due to their high cost (Ji et al., 2016). FPGAs as a programmable and low-cost device can address this issue: they exhibit high performance and reconfiguration capability and are more energy efficient than current CPUs and GPUs. Furthermore, they support parallel processing and contain enough local memory to restore weights, which make them a suitable candidate for implementing SNNs (Guo, Yantir, Fouda, Eltawil, & Salama, 2021). Rahman (2017) demonstrated that with a single CPU, the processing time is slow (around 1 minute per image). However, with the successful FPGA hardware acceleration and the use of a more complex network with a higher number of filters and convolutional layers, it will be possible to use SNNs in real-time scenarios (1 second per image). Compared to application-specific integrated circuits (ASICs), FPGAs are suitable candidates for implementing digital neuromorphic platforms. They provide rapid design and fabrication time, low cost, high flexibility, more straightforward computer interface, and excellent stability. While the improvement potential using FPGAs is high, there are still many open research questions that limit the current mainstream appeal of FPGAs.
Implementation of neural networks on FPGAs is time-consuming compared to CPUs and GPUs. An important reason that FPGAs are still not as widely used as general-purpose hardware platforms like CPUs and GPUs in neural network computing is their relatively low programmability (Hofmann, 2019). Software frameworks such as Caffe and TensorFlow support only hardware units like CPUs and GPUs and can be executed on such operating systems. Although high-level synthesis (HLS) improves development cycle on FPGAs, efficient HLS system designs still require a deep understanding of hardware details, which can be a problem for general neural network developers (Zhang & Kouzani, 2020). There is still a need for FPGA-based frameworks that support the mainstream software neural network libraries like TensorFlow and Caffe.
Several studies have reported different approaches for implementing SNNs on FPGAs for various applications. FPGA-based implementation of SNNs has been presented for classifying musical notes (Cerezuela-Escudero et al., 2015), electrocardiogram (ECG), edge detection (Qi et al., 2014), real-time image dewarping (Molin et al., 2015), locomotion systems (Guerra-Hernandez et al., 2017), biomimetic pattern generation (Ambroise, Levi, Joucla, Yvert, & Saïghi, 2013), and event-driven vision processing (Yousefzadeh, Serrano-Gotarredona, & Linares-Barranco, 2015).
Note that we focus here on recent FPGA-based implementation of SNNs for image classification domain, currently a significant field of machine learning. Many research groups are now concentrating their efforts on developing reservoir computing for solving various classification and recognition problems. Tanaka et al. (2019) summarized recent advance in physical reservoir computing, such as analog circuits, and FPGA. Yi et al. (2016) developed a real-time, hardware-based FPGA architecture of the reservoir computing method of recurrent neural network (RNN) training. Numerous studies have been focusing on designing a suitable neuromorphic architecture for liquid state machines (LSMs) on FPGAs (Liu, Jin, & Li, 2018; Wang, Jin, & Li, 2015; Jin, Liu, & Li, 2016).
There have been several attempts to implement SNNs on FPGAs for pattern recognition. Ju et al. (2020) proposed an FPGA-based deep SNN implementation. They applied a hardware-friendly, spiking, max-pooling operation and two parallel methods, shift register and coarse-grained parallel, to improve the data reuse rate. The FPGA implementation obtained 22 times lower power consumption than a GPU implementation and 41 times speed-up compared to a CPU implementation. Abderrahmane and Miramond (2019) explored a spike-based neural network for embedded artificial intelligence applications. They implemented two architectures, time-multiplexed and fully parallel, on an FPGA platform. However, the FPGA on-chip memory is not sufficient for deeper networks with these two architectures. Efficient memory access is essential to store the parameters and evaluation of a SNN. On-chip memory has a limitation, and off-chip memory consumes more energy than on-chip memory. Thus, designing a suitable architecture can reduce memory access. Nallathambi and Chandrachoodan (2020) proposed a novel probabilistic spike propagation method that reduces the number of off-chip memory accesses required to evaluate a SNN, thus saving time and energy.
To take advantage of both event-based and frame-based processing, Yousefzadeh, Orchard, Stromatias, Serrano-Gotarredona, and Linares-Barranco (2018) proposed a hybrid neural network that combines SNN and ANN features. Their implementation on an FPGA consumes 7 uJ per frame and obtains 97% accuracy on the MNIST database. In similar work, Losh and Llamocca (2019) designed spiking hybrid network (SHiNe), FPGA-based hardware that achieved reasonable accuracy (90%) for the MNIST data set. The SHiNe design has significantly lower FPGA resource utilization (about 35% less) due to two factors: the neural network (the SHiNe network is significantly simpler than the standard neural network, requiring only 1 bit per signal) and neuron implementation (each SHiNe neuron includes only a counter and a set of comparators). They have also implemented an approach named thrifting, which limits the number of allowed connections from neurons in one layer to a neuron in the next layer. Their FPGA designs on the Zynq XC7Z010 PSoC board consume far less power than the GPU or CPU implementations. Zhang et al. (2020) developed an FPGA-based SNN implementation that provides 908,578 times speed-up compared with software implementation. They reduced the consumption of hardware resources by using arithmetic shift instead of multiplication operations, which can speed up the training efficiency.
Han, Li, Zheng, and Zhang (2020) proposed an FPGA-based SNN hardware implementation that supports up to 16.384 neurons and 16.8 million synapses with 0.477 W power consumption. They used a hybrid updating algorithm that included a time-stepped and event-driven algorithm. In addition to on-chip block random access memory (RAM), they used an external DDR memory to optimize the latency of memory access.
Kuang et al. (2019) introduced a real-time FPGA-based implementation of SNNs that significantly reduces the cost of hardware resources with multiplier-less approximation. Their proposed systems are suitable for bio-inspired neuromorphic platform and online applications. An FPGA-based parallel neuromorphic processor for SNNs presented in Wang, Li, Shao, Dey, and Li (2017) successfully tackled several critical problems related to memory organization and parallel processing. A 59.4 times training speed-up was achieved by the 32-way parallel design and reduced up to 20% energy consumption by using the approximate multipliers in their processor design. An FPGA-based SNN hardware implementation with biologically realistic neuron and synapse proposed by Fang, Shrestha, Zhao, Li, and Qiu (2019) applied a population encoding scheme to convert a continuous value into spike events. The FPGA implementation achieves 196 times lower power consumption and 10.1 times speed-up compared to a GPU implementation. Their experiments demonstrate that temporal SNN is 8.43 times speed-up compared with rate SNN on FPGA platform.
Table 6,0 presents0 the performance of recent FPGA-based implementations of SNNs in terms of network configuration, system performance, and target device.
Summary of FGPA-Based Implementation of SNNs.
Reference . | Network Type/Neuron and Encoding Model/Topology . | Recognition CA (%) . | FPGA Platform . | Software Tool/Language . | System Performance . |
---|---|---|---|---|---|
Corradi et al. (2021) | SNN based on conversion method | Optical radar 99.7 | Trenz TE0820-03-4DE21FA | Vivado | Achieves an energy efficiency (nJ/SO) 0,151 |
Panchapakesan et al. (2021) | SNN based on conversion method (VGG-13) | CIFAR-10 90.79 SVHN 96 | FPGA Xilinx ZCU102 | Vivado HLS | Achieves 13,086 frames per second under 200 MHz |
Aung et al. (2021) | Spiking CNN (AlexNet) | CIFAR-10 81.8 SVHN 93.1 | Xilinx Virtex UltraScale FPGA VCU118 | N/A | Achieves 28.3 kFPS running at 425 MHz |
Nallathambi and Chandrachoodan (2020) | Spiking CNN based on conversion method | CIFAR10 76.87 | Intel Cyclone V | N/A | Reduces the number of off-chip memory accesses by close to 90% |
Hong et al. (2020) | Time-delay neural net (TDNN) | CIFAR-10 83.43 | I Kintex7 325T | Verilog | Consumes 4.92 W at 160 MHz clock frequency |
Fang et al. (2020) | SNN based on conversion rule/LIF neuron/population coding/784 600 10 | MNIST 97.7 | Cyclone V | N/A | Obtains 10 speedup and 196 improvement in energy efficiency compared with GPU |
Han (2020) | A feedforward SNN based on a hybrid of the time-stepped and event-driven updating algorithms/LIF neuron/Poisson encoding 784-1200-1200-10 | MNIST 97.06 | Xilinx ZC706 | N/A | Achieves 161 FPS under 200 MHz clock frequency, and very low power consumption of 0.477 W |
Reference . | Network Type/Neuron and Encoding Model/Topology . | Recognition CA (%) . | FPGA Platform . | Software Tool/Language . | System Performance . |
---|---|---|---|---|---|
Corradi et al. (2021) | SNN based on conversion method | Optical radar 99.7 | Trenz TE0820-03-4DE21FA | Vivado | Achieves an energy efficiency (nJ/SO) 0,151 |
Panchapakesan et al. (2021) | SNN based on conversion method (VGG-13) | CIFAR-10 90.79 SVHN 96 | FPGA Xilinx ZCU102 | Vivado HLS | Achieves 13,086 frames per second under 200 MHz |
Aung et al. (2021) | Spiking CNN (AlexNet) | CIFAR-10 81.8 SVHN 93.1 | Xilinx Virtex UltraScale FPGA VCU118 | N/A | Achieves 28.3 kFPS running at 425 MHz |
Nallathambi and Chandrachoodan (2020) | Spiking CNN based on conversion method | CIFAR10 76.87 | Intel Cyclone V | N/A | Reduces the number of off-chip memory accesses by close to 90% |
Hong et al. (2020) | Time-delay neural net (TDNN) | CIFAR-10 83.43 | I Kintex7 325T | Verilog | Consumes 4.92 W at 160 MHz clock frequency |
Fang et al. (2020) | SNN based on conversion rule/LIF neuron/population coding/784 600 10 | MNIST 97.7 | Cyclone V | N/A | Obtains 10 speedup and 196 improvement in energy efficiency compared with GPU |
Han (2020) | A feedforward SNN based on a hybrid of the time-stepped and event-driven updating algorithms/LIF neuron/Poisson encoding 784-1200-1200-10 | MNIST 97.06 | Xilinx ZC706 | N/A | Achieves 161 FPS under 200 MHz clock frequency, and very low power consumption of 0.477 W |
Continued.
Reference . | Network Type/Neuron and Encoding Model/Topology . | Recognition CA (%) . | FPGA Platform . | Software Tool/Language . | System Performance . |
---|---|---|---|---|---|
Ju et al. (2020) | Deep SNN based on conversion rule/IF neuron/fixed uniform encoding/2828-64c5-2s-64c5-2s-128f-10o | MNIST 98.94 | Xilinx Zynq ZCU102 | N/A | Achieves 164 FPS under 150 MHz clock frequency and obtains 41 speed-up and 4.6 W power consumption. |
Liu et al. (2019) | Liquid state machine (LSM) based on spike-timing-dependent-plasticity (STDP) | TI46 speech corpus 95 | Xilinx Zync ZC-706 | N/A | Consumes 237 mW at 100 MHz clock frequency |
Losh and Llamocca (2019) | Spiking hybrid network (SHiNe) based on backpropagation learning. Integrate, rectification, and fire neuron/fixed-frequency duty-cycle encoding 196-64-10 | MNIST 97.70 | Xilinx Zynq XC7Z010-1CLG400C | Vivado 2016.3 | Results in 65.536 s per frame under 125 MHz clock rate and total 161 mW power consumption. |
Guo (2019) | DNN to SNN conversion rule IF neuron/Poisson encoding 2828-12c5-2s-64c5-2s-10 (CNN) 784-1200-1200-10 (FCN) | MNIST 98.98 98.84 | Xilinx V7 690T | Vivado 2016.4 | Consumes 0.745 W at 100 MHz clock frequency, using 32-bit fixed-point precision |
Kuang et al. (2019) | Three-layer SNN based on STDP learning rules/LIF neuron and conductance-based synapse. | MNIST 93 | Stratix III | Verilog HDL | N/A |
Abderrahmane and Miramond (2019) | SNN based on backpropagation learning IF neuron/rate coding I784-300-300-10 | MNIST 97.70 | Intel Cyclone V | Quartus Prime Lite 18.10 VHDL | Achieves 256 FPS under 50 MHz clock frequency with time-multiplexed architecture and 70 K FPS with fully parallel architecture |
Reference . | Network Type/Neuron and Encoding Model/Topology . | Recognition CA (%) . | FPGA Platform . | Software Tool/Language . | System Performance . |
---|---|---|---|---|---|
Ju et al. (2020) | Deep SNN based on conversion rule/IF neuron/fixed uniform encoding/2828-64c5-2s-64c5-2s-128f-10o | MNIST 98.94 | Xilinx Zynq ZCU102 | N/A | Achieves 164 FPS under 150 MHz clock frequency and obtains 41 speed-up and 4.6 W power consumption. |
Liu et al. (2019) | Liquid state machine (LSM) based on spike-timing-dependent-plasticity (STDP) | TI46 speech corpus 95 | Xilinx Zync ZC-706 | N/A | Consumes 237 mW at 100 MHz clock frequency |
Losh and Llamocca (2019) | Spiking hybrid network (SHiNe) based on backpropagation learning. Integrate, rectification, and fire neuron/fixed-frequency duty-cycle encoding 196-64-10 | MNIST 97.70 | Xilinx Zynq XC7Z010-1CLG400C | Vivado 2016.3 | Results in 65.536 s per frame under 125 MHz clock rate and total 161 mW power consumption. |
Guo (2019) | DNN to SNN conversion rule IF neuron/Poisson encoding 2828-12c5-2s-64c5-2s-10 (CNN) 784-1200-1200-10 (FCN) | MNIST 98.98 98.84 | Xilinx V7 690T | Vivado 2016.4 | Consumes 0.745 W at 100 MHz clock frequency, using 32-bit fixed-point precision |
Kuang et al. (2019) | Three-layer SNN based on STDP learning rules/LIF neuron and conductance-based synapse. | MNIST 93 | Stratix III | Verilog HDL | N/A |
Abderrahmane and Miramond (2019) | SNN based on backpropagation learning IF neuron/rate coding I784-300-300-10 | MNIST 97.70 | Intel Cyclone V | Quartus Prime Lite 18.10 VHDL | Achieves 256 FPS under 50 MHz clock frequency with time-multiplexed architecture and 70 K FPS with fully parallel architecture |
Continued.
Reference . | Network Type/Neuron and Encoding Model/Topology . | Recognition CA (%) . | FPGA Platform . | Software Tool/Language . | System Performance . |
---|---|---|---|---|---|
Zhang et al. (2019) | SNN based on the backpropagation algorithm LIF neuron 256-256-10 | MNIST 96.26 | Terasic DE2-115 | Quartus II Verilog | Obtains 10.7 speedup and 293 MW power at 100 MHz |
Liu et al. (2018) | liquid state machine (LSM) based on spike-timing-dependent-plasticity (STDP) | TI46 speech corpus 93.1 CityScape 97.9 | Xilinx Virtex-6 | N/A | It is up to 29% more energy efficient for training and 30% more energy efficient for classifying than the baseline. |
Yousefzadeh (2017) | Two-layer hybrid neural network LIF neuron/Poisson encoding | E-MNIST 97.09% | Xilinx SPARTAN-6 | HDL | Achieves 58 K FPS under 220 MHz clock frequency and consumes 363 MW, which is equal to less than 7 uJ for each frame |
Mostafa et al. (2017) | A feedforward SNN trained using backpropagation. LIF neuron/temporal encoding 748-600-10 | MNIST 96.98 | Xilinx Spartan6-LX150 | N/A | N/A |
Chung et al. (2015) | Time-delay neural network | MNIST 97.64 | Xilinx Artix 7 | Vivado | Processes an input image in 156.8 s under 160 MHz |
Neil and Liu (2014) | Spiking deep belief network LIF neuron 784-500-500-10 | MNIST 92 | Xilinx Spartan6-LX150 | RTL | Achieves 152 ms processing time/image and 1.5 W power consumption at 75 MHz clock frequency |
Reference . | Network Type/Neuron and Encoding Model/Topology . | Recognition CA (%) . | FPGA Platform . | Software Tool/Language . | System Performance . |
---|---|---|---|---|---|
Zhang et al. (2019) | SNN based on the backpropagation algorithm LIF neuron 256-256-10 | MNIST 96.26 | Terasic DE2-115 | Quartus II Verilog | Obtains 10.7 speedup and 293 MW power at 100 MHz |
Liu et al. (2018) | liquid state machine (LSM) based on spike-timing-dependent-plasticity (STDP) | TI46 speech corpus 93.1 CityScape 97.9 | Xilinx Virtex-6 | N/A | It is up to 29% more energy efficient for training and 30% more energy efficient for classifying than the baseline. |
Yousefzadeh (2017) | Two-layer hybrid neural network LIF neuron/Poisson encoding | E-MNIST 97.09% | Xilinx SPARTAN-6 | HDL | Achieves 58 K FPS under 220 MHz clock frequency and consumes 363 MW, which is equal to less than 7 uJ for each frame |
Mostafa et al. (2017) | A feedforward SNN trained using backpropagation. LIF neuron/temporal encoding 748-600-10 | MNIST 96.98 | Xilinx Spartan6-LX150 | N/A | N/A |
Chung et al. (2015) | Time-delay neural network | MNIST 97.64 | Xilinx Artix 7 | Vivado | Processes an input image in 156.8 s under 160 MHz |
Neil and Liu (2014) | Spiking deep belief network LIF neuron 784-500-500-10 | MNIST 92 | Xilinx Spartan6-LX150 | RTL | Achieves 152 ms processing time/image and 1.5 W power consumption at 75 MHz clock frequency |
6 Challenges and Future Research Directions
Spiking neural networks are capable of modeling information processing in the brain, such as pattern recognition. They offer promising event-driven processing, fast inferences, and low power consumption. Spiking CNNs offer a high potential for classification tasks in low-power neuromorphic hardware, as they combine both spike-based computing of SNNs and high CNN accuracy s (Diehl et al., 2015). Additionally, deep SNN offers a promising computational paradigm for improving energy efficiency and reducing classification latency. However, training spiking CNNs and deep SNNs remains challenging because of nondifferentiable spiking dynamics. To tackle this problem, we provided an overview of the state-of-the-art learning rules for SNNs in the section 3. One solution is direct supervised learning, which takes advantage of reducing power consumption and a straightforward technique. This strategy is based on the backpropagation-like technique (Lee et al., 2016) and conventional gradient descent. However, direct training-based strategies still provide less efficiency and stability in coping with a complex database. An alternative technique to direct supervised learning is converting a trained CNN to a SNN by transferring the CNN operations directly into a SNN equivalent. Various approaches have been employed to convert CNNs to SNNs, such as threshold rescaling (Xu et al., 2017), n-scaling weight mapping (Yang et al., 2020), and weights-thresholds balance (Wang, Xu, Yan, & Tang, 2020).
The conversion rule has solved the learning issue for deep SNNs. However, it is not apparent that the conversion method can scale to deeper architectures and address the complex task. Furthermore, there is a possibility of accuracy loss during the conversion of CNNs to SNNs. Other hardware-friendly approaches are local learning rules, such as STDP (Kheradpisheh, Ganjtabesh, Thorpe, & Masquelier, 2018).
This method can be a suitable option design and a biologically plausible learning algorithm for hardware implementation. Additionally, STDP is a good choice for online learning, which allows a fast real-time learning process and reduces the computational complexity.
Spiking neural networks are poorly served by classical von Neumann computing architectures due to the dynamic nature of neurons; in addition, the classic computing architecture requires an extreme amount of time and power. Thus, neuromorphic platforms are ideally suited for executing SNNs. These platforms offer better parallel implementation than that provided by CPUs and less power consumption than GPUs. FPGA offers a programmable and very flexible platform for SNN implementation. Compare to ASICs, FPGAs provide better stability, rapid design time, faster fabrication time, and higher flexibility. The FPGA implementation of SNNs achieves significantly lower power consumption than GPU implementation and better speed-up compared to the CPU implementation (Abderrahmane & Miramond, 2019; Nallathambi & Chandrachoodan, 2020).
In the case of the hardware implementation of deep SNNs, the number of neurons, connections, and weights can be very large, leading to an increase in the size of memory. FPGAs' on-chip memory is not sufficient to store all parameters of the network. Therefore, an external memory like SRAM is required next to the on-chip memory to store the parameters and data flow into the architecture. Thus, choosing a suitable information coding method and designing an effective architecture can reduce memory fetches. Different architectures have been used for FPGA-based implementation of SNNs such as fully parallel and time-multiplexed. Designing an FPGA architecture depends on the application target.
Focusing on the advancement of SNNs and their neuromorphic implementations, the following research aspects need to be considered, and more work is required to resolve the remaining challenges and limitations:
One of the key challenges in developing SNNs is to deploy suitable training and learning algorithms, which profoundly affect application accuracy and execution cost.
Another unsolved challenge is how information is encoded with spikes. Although neural coding has a remarkable effect on the fulfillment of SNNs, the questions remain as to what the best encoding approach is and how to develop a learning algorithm to be well matched by the encoding scheme. Designing a learning algorithm that is capable of training hidden neurons in an interconnected SNN has become a major challenge.
Neuromorphic computing is at an early stage, and much progress is needed in both algorithm and hardware that are capable of exhibiting human-like intelligence.
7 Conclusion
Spiking neural networks have been considered the third generation of neural network, offering a high-speed real-time implementation of a complex problem in a bio-inspired, power-efficient manner. This review offers an overview of recent strategies to train SNNs and highlights two popular deep learning methods: spiking CNNs and deep, fully connected SNNs, in terms of their learning rule, network architecture, and experiment and recognition accuracy. This review also discussed current SNN simulators, comparing three main approaches: clock-driven, event driven, and hybrid; it presented a survey of the work done on hardware implementation of SNNs; and demonstrated that FPGAs are a promising candidate for acceleration of SNNs and can achieve better speed-up than CPUs and less energy consumption than GPUs.