Abstract
Spiking neural networks (SNNs) have emerged as a promising alternative to traditional deep neural networks for low-power computing. However, the effectiveness of SNNs is not solely determined by their performance but also by their energy consumption, prediction speed, and robustness to noise. The recent method Fast & Deep, along with others, achieves fast and energy-efficient computation by constraining neurons to fire at most once. Known as time-to-first-spike (TTFS), this constraint, however, restricts the capabilities of SNNs in many aspects. In this work, we explore the relationships of performance, energy consumption, speed, and stability when using this constraint. More precisely, we highlight the existence of trade-offs where performance and robustness are gained at the cost of sparsity and prediction latency. To improve these trade-offs, we propose a relaxed version of Fast & Deep that allows for multiple spikes per neuron. Our experiments show that relaxing the spike constraint provides higher performance while also benefiting from faster convergence, similar sparsity, comparable prediction latency, and better robustness to noise compared to TTFS SNNs. By highlighting the limitations of TTFS and demonstrating the advantages of unconstrained SNNs, we provide valuable insight for the development of effective learning strategies for neuromorphic computing.
1 Introduction
Over the past decade, deep neural networks (DNNs) have become indispensable tools in statistical machine learning, achieving state-of-the-art performance in various applications, including computer vision (Krizhevsky et al., 2012; Szegedy et al., 2013), natural language processing (Vaswani et al., 2017; Devlin et al., 2018; Brown et al., 2020), and reinforcement learning (Mnih et al., 2013, 2016). However, their impressive performance often comes at a significant hardware and energy cost. For example, natural language processing models can consist of billions of parameters and require energy-intensive GPU clusters to train efficiently (Brown et al., 2020). These hardware and energy requirements pose a significant challenge in terms of sustainability and restrict the practical applicability of DNNs in resource-limited environments such as low-powered edge devices. Therefore, exploring more energy-efficient alternatives to DNNs is crucial not only to address the environmental cost of machine learning but also to provide practical and sustainable solutions in edge computating.
One possible alternative to DNNs is spiking neural networks (SNNs). Spiking neurons process information through discrete spatiotemporal events known as spikes rather than continuous real-number values (Maass, 1997; Gerstner & Kistler, 2002). Spikes enable efficient implementations of neural networks on non-von Neumann neuromorphic hardware such as Intel Loihi, IBM TrueNorth, Brainscale2, and SpiNNaker (Painkras et al., 2013; Akopyan et al., 2015; Schmitt et al., 2017; Davies et al., 2018; Furber, 2016; Hendy & Merkel, 2022). Such hardware consumes only a fraction of the power required by DNNs on von Neumann computers and thus represents a suitable solution for energy-efficient edge computing (Blouw et al., 2019; Taunyazov et al., 2020).
The power consumption of neuromorphic hardware is closely related to the number of spikes they produce. Therefore, sparse SNNs that fire only a small number of spikes achieve high energy efficiency on hardware. However, such networks also transmit less information, creating a trade-off between energy consumption and model accuracy. In addition to sparsity, various trade-offs between performance and other aspects of SNNs are commonly explored in the literature (Park et al., 2021; Yin et al., 2023; Li et al., 2021; Diehl et al., 2015).
For instance, unsupervised learning rules such as spike time dependent plasticity (STDP) can be implemented directly on neuromorphic hardware, allowing for biologically plausible and energy-efficient training of SNNs that are resilient to substrate noise of analog circuits (Kim et al., 2020). However, the performance of unsupervised learning lags behind that achieved with supervised learning. Meanwhile, state-of-the-art performance with SNNs is currently achieved through various error backpropagation (BP) techniques adapted from deep learning (Bohté et al., 2000; Lee et al., 2016; Jin et al., 2018; Kheradpisheh & Masquelier, 2020; Kheradpisheh et al., 2021; Zhang et al., 2022; Shrestha & Orchard, 2018; Wu et al., 2018; Mostafa, 2016; Comsa et al., 2020; Göltz et al., 2021; Fang et al., 2021). However, BP algorithms often require constraints on spikes to achieve high sparsity and low prediction latency, at the cost of performance (Yan et al., 2022; Mostafa, 2016; Guo et al., 2020; Kheradpisheh & Masquelier, 2020; Kheradpisheh et al., 2021; Göltz et al., 2021). In addition, BP requires global transport of information that is incompatible with neuromorphic hardware, and training must be performed either offline or in-the-loop, where a conventional computer is used in conjunction with neuromorphic hardware (Schmitt et al., 2017; Göltz et al., 2021). Therefore, SNNs trained offline or in the loop must also be resilient to the substrate noise and weight quantization of analog hardware to avoid performance drops at deployment.
One particularly interesting approach for training fast, energy-efficient, and noise-resilient SNNs is Fast & Deep (Göltz et al., 2021). This exact BP method employs time-to-first-spike (TTFS) coding, which restricts neurons to fire only once. Inspired by the human visual system (Thorpe et al., 1996), TTFS is based on the idea that first spikes of neurons must carry most of the information about input stimuli, enabling fast, sparse, and energy-efficient computation. Due to this constraint imposed on firing, we thus referred to TTFS networks as constrained SNNs. However, relaxing the spike constraint of TTFS and allowing multiple spikes per neuron typically results in higher information rates, better performance, and increased noise resilience compared to TTFS networks (Jin et al., 2018; Zhang & Li, 2019; Shrestha & Orchard, 2018; Lee et al., 2016; Zhang et al., 2022). One might assume that such network, referred to as unconstrained SNNs, would improve performance and noise robustness but also result in slower inference and lower energy efficiency due to increased firing rates.
In this work, we explore the trade-offs among performance, convergence, energy consumption, prediction speed, and robustness of SNNs, with and without the spike constraint imposed by TTFS. We make the following main contributions:
We demonstrate that many aspects are driven by the weight distribution in Fast & Deep, highlighting trade-offs among performance, energy consumption, latency, and stability in TTFS SNNs.
We extend the Fast & Deep algorithm to multiple spikes per neuron and describe how errors are backpropagated in unconstrained SNNs.
We show that our proposed method improves performance while providing better convergence rate, similar sparsity, comparable latency, and improved robustness to noise compared to Fast & Deep, suggesting that relaxing the spike constraints in TTFS can lead to better trade-offs.
2 Method
In this section, we describe our generalization of the Fast & Deep algorithm to multiple spikes per neuron. Our main contribution lies in the reset of the membrane potential and how errors are backpropagated through interneuron and intraneuron dependencies.
2.1 The CuBa LIF Neuron
We consider a neural network of current-based leaky integrate-and-fire neurons with a soft reset of the membrane potential. (Gerstner & Kistler, 2002; Davies et al., 2018; Göltz et al., 2021).
The reset of the membrane potential in equation 2.1 is the major difference with the TTFS model used in Fast & Deep (Göltz et al., 2021). By resetting the membrane potential after postsynaptic spikes, our model allows for further integration of inputs and thus the firing of several spikes. Therefore, this relaxes the constraint on spike counts imposed by Fast & Deep.
2.2 SRM Mapping
2.3 Closed-Form Solution of Spike Timing
This equation can be thus used to infer the spike trains of neurons in an event-based manner.
2.4 Spike Count Loss Function
2.5 Gradient of Unconstrained Neurons
Because the spike timing now has a closed-form solution, it becomes differentiable, which allows the computation of an exact gradient. We first state the total change of weight between two neurons.
2.6 Spike Errors
Illustration of error backpropagation through spikes. This figure represents a three-layered network where the spike trains of only one neuron per layer are shown. The gray arrows represent the error coming from the loss function, the dashed blue arrows are the errors backpropagated from the downstream spikes (i.e., interneuron dependencies), and the red arrows are the error backpropagated from the future activity of the neuron due to the recurrence of the reset function (i.e., intraneuron dependencies).
Illustration of error backpropagation through spikes. This figure represents a three-layered network where the spike trains of only one neuron per layer are shown. The gray arrows represent the error coming from the loss function, the dashed blue arrows are the errors backpropagated from the downstream spikes (i.e., interneuron dependencies), and the red arrows are the error backpropagated from the future activity of the neuron due to the recurrence of the reset function (i.e., intraneuron dependencies).
3 Results
In this section, we compare our proposed method with Fast & Deep. This evaluation was conducted based on multiple criteria, including performance on benchmark data sets, convergence rate, sparsity, classification latency, as well as robustness to noise and weight quantization.
Experimental conditions were standardized for both methods except for weight distributions and thresholds. Two uniform weight distributions, (wi, j ∼ U(− 1, 1) and wi, j ∼ U(0, 1)), were used to evaluate Fast & Deep, to measure the effect of initial weight distributions on the different evaluation criteria. Our method was solely assessed using wi, j ∼ U(− 1, 1), as positive initial weights lead to excessive spiking activity, hindering computational and energy efficiency. Thresholds were manually tuned to find the best-performing networks and kept fixed during training. In our experiments, all layers (including convolutional layers) have been directly trained using our proposed method and Fast & Deep, and no conversion from DNN to SNN has been performed. More details about our experimental settings are in the appendixes.
3.1 Performance and Convergence Rate
To assess the performance of our proposed method, we trained fully connected SNNs on the MNIST (LeCun et al., 2010), EMNIST (Cohen et al., 2017) (Balanced Extended MNIST), and Fashion-MNIST (Xiao et al., 2017) data sets as well as convolutional SNNs on MNIST. We also evaluated our method on temporal data classification by training fully connected networks on the Spiking Heidelberg Digits (SHD) data set (Cramer et al., 2022), an English and German spoken digits classification task. We compared our results to those obtained using the original Fast & Deep algorithm. Table 1 summarizes the average test accuracies of both methods given the considered initial weight distributions. For completeness, a comparison of Fast & Deep and our method with other spike-based BP algorithms is in the appendix.
Performance Comparisons between Fast & Deep and Our Method on the MNIST, EMNIST, Fashion-MNIST, and Spiking Heidelberg Digits (SHD) Data Sets.
. | . | Fast & Deep . | Fast & Deep . | Our Method . |
---|---|---|---|---|
Data Set . | Architecture . | U(−1, 1) . | U(0, 1) . | U(−1, 1) . |
MNIST | 800-10 | 96.76 ± 0.17% | 97.83 ± 0.08% | 98.88 0.02% |
Conv. | 99.01 ± 0.16% | 99.22 ± 0.05% | 99.38 0.04% | |
EMNIST | 800-47 | 69.56 ± 6.70% | 83.34 ± 0.27% | 85.75 0.06% |
Fashion MNIST | 400-400-10 | 88.14 ± 0.08% | 88.47 ± 0.20% | 90.19 0.12% |
SHD | 128-20 | 33.84 ± 1.35% | 47.37 ± 1.65% | 66.8 0.76% |
. | . | Fast & Deep . | Fast & Deep . | Our Method . |
---|---|---|---|---|
Data Set . | Architecture . | U(−1, 1) . | U(0, 1) . | U(−1, 1) . |
MNIST | 800-10 | 96.76 ± 0.17% | 97.83 ± 0.08% | 98.88 0.02% |
Conv. | 99.01 ± 0.16% | 99.22 ± 0.05% | 99.38 0.04% | |
EMNIST | 800-47 | 69.56 ± 6.70% | 83.34 ± 0.27% | 85.75 0.06% |
Fashion MNIST | 400-400-10 | 88.14 ± 0.08% | 88.47 ± 0.20% | 90.19 0.12% |
SHD | 128-20 | 33.84 ± 1.35% | 47.37 ± 1.65% | 66.8 0.76% |
Notes: The initial weight distribution used in each column is specified in the first row of the table. Conv. refers to a convolutional SNN with the following architecture: 15C5-P2-40C5-P2-300-10. The numbers in bold refer to the highest accuracy obtained by the method in each row.
First, it should be noted that the Fast & Deep algorithm generally achieves better performance when the weights are initialized with positive values, which is consistent with the choice of weight distribution made by Göltz et al. (2021). Second, our proposed method demonstrates superior performance compared to the Fast & Deep algorithm, with improvement margins ranging from 1.05% on MNIST to 2.41% on the more difficult EMNIST data set. This is not surprising given that unconstrained SNNs are known to perform better compared to TTFS networks (Jin et al., 2018; Zhang & Li, 2019; Shrestha & Orchard, 2018; Lee et al., 2016; Zhang et al., 2022). Moreover, our method outperforms by at least 19% Fast & Deep on the SHD data set with a single hidden layer of 128 neurons. Note that no recurrent connections were used in these experiments. To understand the reason for this performance gap, we analyzed the spiking activity in the hidden layers after training. Figure 3 shows that TTFS neurons respond only to early stimuli. In temporal coding, high-valued information is encoded by early spikes. Training with a one-spike constraint is therefore energy efficient but tends to make SNNs spike as early as possible, thus missing the information occurring later in time. In contrast, neurons trained without spike constraint are able to respond throughout the duration of the sample, thus capturing all the information despite an increased number of spikes fired. This highlights the importance of firing more than once and demonstrates that a trade-off exists between performance and energy consumption when processing temporal data.
Test accuracy of Fast & Deep and our method on MNIST for the same learning rate. The unconstrained SNN trained with our method benefits from a higher convergence rate than the temporally-coded networks trained with Fast & Deep. The two SNNs trained with Fast & Deep have similar convergence rates despite their difference in initial weight distribution.
Test accuracy of Fast & Deep and our method on MNIST for the same learning rate. The unconstrained SNN trained with our method benefits from a higher convergence rate than the temporally-coded networks trained with Fast & Deep. The two SNNs trained with Fast & Deep have similar convergence rates despite their difference in initial weight distribution.
Spiking activity of hidden neurons in a SNN trained with our method (panel b) and Fast & Deep (panels c and d) given a spoken “zero” (panel a) from the SHD data set. TTFS neurons in Fast & Deep mainly respond to early stimuli, missing most of the input information. In contrast, our method allows for multiple spikes per neuron, which enables them to capture all the information from the inputs. This demonstrates the importance of relaxing the spike constraint of TTFS when processing temporal data.
Spiking activity of hidden neurons in a SNN trained with our method (panel b) and Fast & Deep (panels c and d) given a spoken “zero” (panel a) from the SHD data set. TTFS neurons in Fast & Deep mainly respond to early stimuli, missing most of the input information. In contrast, our method allows for multiple spikes per neuron, which enables them to capture all the information from the inputs. This demonstrates the importance of relaxing the spike constraint of TTFS when processing temporal data.
In addition, our results indicate that the convergence rate of SNNs with multiple spikes is higher compared to TTFS networks. Figure 2 depicts3 the evolution of the test accuracy of both methods on MNIST. This demonstrates that our method can reach desired accuracies in fewer epochs compared to Fast & Deep. Such improvement implies that discriminative features are learned earlier, during training.
3.2 Network Sparsity
Achieving high sparsity in trained SNNs is critical for energy efficiency, as neuromorphic hardware consumes energy only at spike events.
Figure 4 shows the population spike counts of fully connected SNNs trained on MNIST using both methods. It appears that the initial weight distribution plays an important role in the final sparsity level of the trained SNNs. In the analyzed case, SNNs initialized with positive weights appear to be less sparse after training than SNNs initialized with both negative and positive weights.
Comparison of the population spike count and the number of active neurons in fully connected SNNs trained using Fast & Deep and our proposed method on the MNIST data set. These results indicate that the sparsity of SNNs after training depends on the weight distribution, with the SNNs initialized with only positive weights appearing to be less sparse than those initialized with both negative and positive weights. In addition, our proposed method demonstrates a similar level of sparsity and fewer active neurons as Fast & Deep for the same initial weight distribution, despite the relaxed constraint on neuron spike counts.
Comparison of the population spike count and the number of active neurons in fully connected SNNs trained using Fast & Deep and our proposed method on the MNIST data set. These results indicate that the sparsity of SNNs after training depends on the weight distribution, with the SNNs initialized with only positive weights appearing to be less sparse than those initialized with both negative and positive weights. In addition, our proposed method demonstrates a similar level of sparsity and fewer active neurons as Fast & Deep for the same initial weight distribution, despite the relaxed constraint on neuron spike counts.
While our proposed method allows for an increased number of spikes per neuron, which implies more energy consumption, we found that it can achieve similar sparsity as TTFS networks initialized with both negative and positive weights while performing better than TTFS networks initialized with positive weights only (see Figure 4 and Table 1). This suggests that our proposed method can offer improved trade-offs between accuracy and sparsity.
To understand why our method can achieve such levels of sparsity despite not imposing any constraints on neuron firing, we analyzed the average activity in each network. Figure 5 shows that neurons trained with Fast & Deep fire indiscriminately in response to any input digit, which is characteristic of temporal coding where information is represented by the timing of spikes rather than the presence or absence of spikes. In contrast, SNNs trained with our method exhibit a different distribution of firing activity, with certain key neurons selectively responding to specific digits. Figure 4 indicates that only 7% of neurons trained with our method are active during inference (i.e., neurons that fire at least once). In comparison, the SNN trained with Fast & Deep and initialized with both negative and positive weights has 24% of its neurons firing. Therefore, the reduced proportion of active units in our method compensates for the increased number of spikes per neuron, leading to fewer spikes emitted in the network.
Panels a and b show the average spike count of hidden neurons trained with Fast & Deep on the MNIST data set while panel c shows the average spike count of hidden neurons trained with our proposed method. Each row corresponds to the average activity over all the test samples of a particular digit. As TTFS networks mainly encode information temporally, we observe that neurons trained with Fast & Deep fire indiscriminately in response to stimuli, making it difficult to differentiate the labels from the mean spike count, regardless of the initial weight distribution. However, our proposed SNN training method results in a different distribution of firing activity. More precisely, key neurons respond selectively to particular digits, while most of the other neurons remain mostly silent.
Panels a and b show the average spike count of hidden neurons trained with Fast & Deep on the MNIST data set while panel c shows the average spike count of hidden neurons trained with our proposed method. Each row corresponds to the average activity over all the test samples of a particular digit. As TTFS networks mainly encode information temporally, we observe that neurons trained with Fast & Deep fire indiscriminately in response to stimuli, making it difficult to differentiate the labels from the mean spike count, regardless of the initial weight distribution. However, our proposed SNN training method results in a different distribution of firing activity. More precisely, key neurons respond selectively to particular digits, while most of the other neurons remain mostly silent.
We found during our experiments that the sparsity of SNNs trained with our method was also influenced by the choice of threshold values. More precisely, we observed that decreasing the output threshold resulted in a reduction of activity in the network, as illustrated in Figure 6a. This decrease in activity occurred because the output neurons fired more often, which is illustrated in Figure 6b. The increased number of output spikes caused the loss function to produce more negative errors, which resulted in negative changes of weights in the hidden layer. However, lowering the threshold has a dual impact on activity: it increases the firing rates at initialization but also contributes to producing more negative errors, which can decrease activity during learning. While controlling sparsity using thresholds is trivial in shallow SNNs, a fine balance between thresholds has to be found to control sparsity in Deep SNNs, making it challenging to achieve sparsity with larger architectures. However, our findings suggest that threshold values play a crucial role in determining the sparsity of unconstrained SNNs and can be seen as a way to control activity without the need for firing rate regularization.
Influence of the output threshold on the sparsity of a two-layer SNN trained on MNIST with our method. Panel a illustrates that a lower output threshold results in fewer spikes generated after one epoch. Panel b indicates that decreasing the output threshold increases the initial activity in the output layer, thereby leading to a greater number of negative errors transmitted during the backward pass (as shown in panel c). This, in turn, leads to a decrease in the weights in the hidden layer, as depicted in panel d.
Influence of the output threshold on the sparsity of a two-layer SNN trained on MNIST with our method. Panel a illustrates that a lower output threshold results in fewer spikes generated after one epoch. Panel b indicates that decreasing the output threshold increases the initial activity in the output layer, thereby leading to a greater number of negative errors transmitted during the backward pass (as shown in panel c). This, in turn, leads to a decrease in the weights in the hidden layer, as depicted in panel d.
3.3 Prediction Latency
We also investigated the prediction latency of SNNs trained using different methods. This corresponds to the amount of simulation time the models need to reach full confidence in their predictions and determines the simulation duration required to achieve high accuracy.
Figure 7a shows the averaged prediction confidence over time for SNNs trained using each method. We found that when initialized with only positive weights, Fast & Deep achieved high confidence on predictions after more than 150 ms. However, when initialized with both negative and positive weights, both Fast & Deep and our proposed method achieved confidence earlier (in 20 ms and 50 ms, respectively). This suggests that initialization has a significant impact on prediction latency. However, SNNs trained with our proposed method are slightly slower than TTFS networks trained with Fast & Deep for the same weight distribution due to the increased spike count per neuron.
(a) The evolution of the average prediction confidence during simulations on the MNIST test set. To produce this figure, we measured the probability of predictions at a time t being equal to the final predictions at the end of the simulations. The vertical dotted line represents the end of input spikes. The initial weight distribution seems to have a crucial impact on the latency of predictions. More precisely, negative initial weights produce confidence earlier than positive initial weights. Therefore, simulation time can be reduced to further improve sparsity. (b) The relationship between spike count and accuracy as the simulation time increases. It demonstrates that the duration of simulation can be used as a posttraining method to further reduce energy consumption while maintaining high performance.
(a) The evolution of the average prediction confidence during simulations on the MNIST test set. To produce this figure, we measured the probability of predictions at a time t being equal to the final predictions at the end of the simulations. The vertical dotted line represents the end of input spikes. The initial weight distribution seems to have a crucial impact on the latency of predictions. More precisely, negative initial weights produce confidence earlier than positive initial weights. Therefore, simulation time can be reduced to further improve sparsity. (b) The relationship between spike count and accuracy as the simulation time increases. It demonstrates that the duration of simulation can be used as a posttraining method to further reduce energy consumption while maintaining high performance.
Despite this difference, the short latency of these networks allows for shorter simulation durations, which can further improve sparsity without affecting performance. In Figure 7b, we show the relationship between population spike count and accuracy for each SNN as simulation time increases from 0 to 200 milliseconds. This demonstrates that by reducing the simulation time, SNNs can become sparser while maintaining high performance. Therefore, the sparsity-accuracy trade-off can be further improved after training by adjusting the simulation duration. This also demonstrates that our method can align with the level of the sparsity of Fast & Deep while still performing better.
3.4 Robustness to Noise and Weights Quantization
Analog neuromorphic hardware is inherently noisy and often limited to specific ranges and resolutions of weights. Having a model that is robust to noise and weight quantization is therefore important to achieve high performance on such systems. To assess the robustness of each method, we measured the impact of spike jitter, weight clipping, and weight precision on their accuracy. In these experiments, performance was normalized with the maximum accuracy of each method to better compare variations.
Figure 8a shows the impact of spike jitter on the performance of each method. These results were produced by artificially adding noise to spike timings with a normal distribution during training. When initialized with both negative and positive weights, Fast & Deep appears to be less robust than using only positive weights. With negative weights, only a fraction of neurons transmits information, which leads to an increased sparsity, as illustrated in Figure 4. Therefore, introducing noise to spike timings has a significant impact on performance. In contrast, positive weights ensure consistent network activity and redundancy in transmitted information. SNNs initialized with positive weights are thus less affected by spike jitter. However, SNNs trained with Fast & Deep remain susceptible to noise, even with positive initial weights. Perturbations in spike timing still have a critical impact on temporal coding. In contrast, our proposed method demonstrates greater robustness to spike jitter than Fast & Deep, with minimal variation observed. This is a result of the redundancy created by the multiple spikes fired by neurons.
(a) The effect of spike jitter on the performance of each method. This was achieved by introducing artificial noise to the spike timings, following a normal distribution . (b) The impact of weight clipping, which involved restricting weights to the range [− wclip, wclip] during training. (c) The effect of weight precision, which was obtained by discretizing weights into 2n + 1 − 1 bins (n bits plus one bit for the sign of the synapse) within the range [−1, 1]. Overall, our method was found to be more resilient to noise and reduced weight precision than Fast & Deep.
(a) The effect of spike jitter on the performance of each method. This was achieved by introducing artificial noise to the spike timings, following a normal distribution . (b) The impact of weight clipping, which involved restricting weights to the range [− wclip, wclip] during training. (c) The effect of weight precision, which was obtained by discretizing weights into 2n + 1 − 1 bins (n bits plus one bit for the sign of the synapse) within the range [−1, 1]. Overall, our method was found to be more resilient to noise and reduced weight precision than Fast & Deep.
In Figure 8b, we demonstrate the effect of weight range on performance by clipping weights between the range [− wclip, wclip] during training. The performance of Fast & Deep initialized with positive weights degrades when wclip is lower than 1.5. However, both our method and Fast & Deep exhibit robustness to reduced weight ranges when initialized with both negative and positive weights. This suggests that weight distribution may play a role in the network’s resilience to limited weight ranges.
Finally, Figure 8c shows the performance of each method with reduced weight resolutions from 5 to 2 bits (results with float precision are also given as a reference). It highlights that Fast & Deep is less robust to reduced weight precision than our method, particularly with negative weights. In contrast, our approach is only slightly affected by the decreased precision, even when reduced to as low as 2 bits.
4 Discussion
In this work, we explored the trade-offs between performance and various aspects of TTFS SNNs such as sparsity, classification latency, and robustness to noise and weight quantization. We also generalized the Fast & Deep algorithm by incorporating a reset of the membrane potential, which enables multiple spikes per neuron, and compared the improvements of the proposed method with the origin algorithm on those trade-offs.
We found that initializing Fast & Deep with positive weights leads to better generalization capabilities compared to initializing with both negative and positive weights. This observation is consistent across the benchmarked data sets, as shown in Table 1. However, relaxing the spike constraint improves the overall performance and convergence rate of SNNs, at least on the benchmark problems we considered. This result was expected before our experiments since BP methods that use multiple spikes per neuron generally perform better than methods that impose firing constraints (Jin et al., 2018; Zhang & Li, 2019; Shrestha & Orchard, 2018; Lee et al., 2016; Zhang et al., 2022).
Our experiments also demonstrate that the weight distribution significantly influences the sparsity of Fast & Deep. We observed that SNNs with positive weight initialization tend to be less sparse than those initialized with weights between −1 and 1. However, the former consistently outperforms the latter in terms of performance. This highlights the accuracy-sparsity trade-off often observed when training SNNs (Yin et al., 2023; Li et al., 2021). The quasi-dense activity provided by positive weights explains the difference in sparsity, as shown in Figure 5b. In contrast, initializing Fast & Deep with both negative and positive weights leads to fewer active neurons due to the inhibition provided by negative weights. Additionally, neurons trained with Fast & Deep fire indiscriminately to stimuli, suggesting a pure temporal representation of information, whereas neurons trained with our proposed method selectively respond to their inputs and exhibit a different distribution of activity, as shown in Figure 5c. Our unconstrained SNNs allow for a different distribution of the spike activity, whereby key neurons can fire more often than others, while irrelevant neurons may not spike at all. This enables our method to achieve a level of sparsity comparable to Fast & Deep on image classification, as illustrated in Figure 4.
To achieve a high degree of sparsity without firing rate regularization, thresholds can be tuned to indirectly influence spiking activity through learning. Decreasing thresholds increases the firing rate of downstream layers, resulting in more negative errors at outputs and consequently negative weight changes in hidden layers, as shown in Figure 6. This mechanism offers a natural way to control sparsity in unconstrained SNNs without requiring any regularization technique. By leveraging this principle, we were able to train shallow SNNs with a sparsity level similar to Fast & Deep while achieving higher performance. This implies that allowing multiple spikes per neuron has the potential to enhance the accuracy-sparsity trade-off and prompts further investigation into the effectiveness of TTFS in achieving efficient computation. However, finding thresholds that lead to high sparsity is more difficult when networks become deeper due to the fluctuations in the firing rates of each layer. The factors that influence sparsity in unregularized SNNs are currently not fully understood and present an opportunity for future research to investigate how to naturally achieve sparsity in deep architectures.
Our experiments on the SHD data set also demonstrated the significance of the accuracy-sparsity trade-off when processing temporal data. We found that the ability of neurons to fire multiple spikes is critical in capturing all the information about the inputs over time. However, since TTFS neurons encode important information in early spikes, they tend to fire too early to capture all the information, which makes them less effective than unconstrained SNNs on temporal data. Although TTFS SNNs are more energy efficient, the relaxation of spike constraints in unconstrained SNNs allows neurons to fire throughout the simulation, thereby capturing all the relevant information. Consequently, they perform significantly better than TTFS SNNs when processing temporal data.
In addition to performance and sparsity, we measured the prediction latency of each method, which is the waiting time required before the system can make reliable predictions. We found that the speed of classification was primarily driven by the weight distribution. Figure 7a shows that both Fast & Deep and our method achieve similar latencies when initialized between −1 and 1, with a slight advantage for Fast & Deep. However, when initialized with only positive values, Fast & Deep requires more simulation time to achieve full confidence in predictions. Low latency is advantageous not only in terms of inference speed but also in improving energy efficiency. If predictions occur early enough, the duration of simulations can be significantly reduced, limiting the number of spikes fired by neurons. In Figure 7b, we demonstrated that reducing the simulation time can lead to a reduction in computational cost for both Fast & Deep and our method while maintaining the same performance. This shows that prediction latency, energy consumption, and performance are closely related and that unconstrained SNNs can offer better trade-offs between these aspects than TTFS SNNs.
The last characteristic that we investigated is the robustness to the noise and weight quantization inherent in analog neuromorphic hardware. The timing of spikes is a critical factor for the performance of TTFS SNNs as it carries most of the information. Therefore, perturbations in these timings and weight constraints can significantly affect the reliability of the feature extraction. In contrast, our proposed method benefits from an increased number of spikes per neuron, providing redundancy that enhances resilience to noise and weight constraints. For instance, Figure 8a demonstrates that our method is less affected by perturbations in spike timings than Fast & Deep. This suggests that our proposed method has the potential to provide more stable learning on analog neuromorphic hardware than Fast & Deep.
Finally, our work specifically concentrated on backpropagation in SNNs with fixed thresholds and time constants. However, several studies have demonstrated that incorporating adaptive thresholds and trainable time constants can enhance the convergence, sparsity, and performance of SNNs (Zambrano et al., 2019; Chen et al., 2022; Fang et al., 2021; Yin et al., 2021). Therefore, future studies could explore the integration of adaptive thresholds and trainable time constants into our proposed method to further enhance the sparsity-accuracy trade-offs in SNNs.
5 Conclusion
Our work demonstrates that relaxing the spike constraint of TTFS SNNs results in improved trade-offs among performance, sparsity, latency, and noise robustness. Our findings also highlight the crucial role of thresholds in regulating the sparsity of unconstrained SNNs during learning, which could serve as a natural alternative to firing rate regularization. Although error backpropagation algorithms for SNNs are incompatible with neuromorphic hardware, their development provides valuable insights into how spiking neurons affect objective functions and could support the development of hardware-compatible algorithms. Therefore, our work contributes to a better understanding of how to compute exact gradients in SNNs and highlights the advantages of using multiple spikes per neuron over TTFS.
Appendix A: Experimental Settings
A.1 Simulations
A.2 Input Encoding
We used a TTFS encoding scheme to benefit from a low number of input spikes and fast processing. For image classification tasks, we encoded the pixels into spike timing as follows:
A.3 Implementation of Fast & Deep
To reproduce Fast & Deep with TTFS models, we constrained the number of firings allowed per neuron to one in our implementation and used a TTFS softmax cross-entropy loss function, as described in (Göltz et al., 2021).
A.4 Architectures and Parameters
For the MNIST EMNIST and SHD data sets, we trained both TTFS and unconstrained fully connected SNNs with a batch size of 50 and a maximum number of spikes per neuron of 30 for the unconstrained SNNs. Output spike targets were set to 15 for the target label and 3 for the others. We used a learning rate of λ = 0.003 for image classification and λ = 0.001 for the SHD data set. No data augmentation was used with full connected networks.
For Fashion-MNIST, we implemented a three-layer, fully connected network composed of two hidden layers of 400 neurons each and a 10 neuron output layer. We allowed a maximum number of spikes per neuron of 5 for the hidden layers and 20 for the output layer. We also set the target spike counts to 15 for the true class and 3 for the others. We used a batch size of 5 with a learning rate of λ = 0.0005, a learning rate decay factor of 0.5 every 10 epochs, and a minimum rate of 0.0001.
The weight kernels of convolution neurons were shared within each layer, as in rate-based CNNs. Convolution allows the detection of spatially correlated features and therefore makes networks invariant to translations. In contrast to fully connected SNNs, the translation invariances of CSNNs allow the networks to detect objects at different locations in space. We used a six-layer CSNN composed of two spiking convolutional layers of 15 5 × 5 and 40 5 × 5 filters, respectively, each followed by 2 × 2 spike aggregation pooling layer (i.e., the spike trains of input neurons are aggregated into a single spike train). The spikes of the last pooling layer are finally sent to two successive fully connected layers of sizes 300 and 10, respectively. Each layer allows an increasing number of spikes per neuron, starting from a single spike for the first convolutional layer, 3 for the second layer, 10 for the fully connected layer, and 30 spikes per neuron for the output layer. We also set the output spike targets to 30 for the true label and 3 for the others. The CSNN was also trained with data augmentation. In this case, we used elastic distortions (Simard et al., 2003) to transform the MNIST training images. We finally trained the networks for 100 epochs with a batch size of 20, a learning rate of λ = 0.003, a decay factor of 0.5 every 10 epochs, and a minimum rate of 0.0001.
In all our experiments, we used the Adam optimizer with the values of β1, β2, and ϵ set as in the original paper (Kingma & Ba, 2015). Initial weights were randomly drawn from a uniform distribution U[a, b]. Networks trained on image classification used the same base time constant of τs = 0.130. For the SHD data set, we used a time constant of τs = 0.100. All thresholds were manually tuned to find the best-performing networks for each method and data set. Thresholds were then kept fixed during training. Finally, we did not use any regularization or synaptic scaling techniques in any of our experiments to provide a fair comparison between TTFS and unconstrained SNNs.
Appendix B: Fully Connected SNNs on MNIST
Performance of Several Methods on the MNIST Data Set.
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
Fast & Deep (Göltz et al., 2021) | 350 | 97.1 ± 0.1% |
Wunderlich & Pehle (Wunderlich & Pehle, 2021) | 350 | 97.6 ± 0.1% |
Alpha Synapses (Comsa et al., 2020) | 340 | 97.96% |
S4NN (Kheradpisheh & Masquelier, 2020) | 400 | 97.4 ± 0.2% |
BS4NN (Kheradpisheh et al., 2021) | 600 | 97.0% |
Mostafa (Mostafa, 2016) | 800 | 97.2% |
STDBP (Zhang et al., 2022) | 800 | 98.5% |
Fast & Deep (Göltz et al., 2021) | 800 | 97.830.08% |
(our implementation) | ||
Unconstrained | ||
eRBP (Neftci et al., 2017) | 2 × 500 | 97.98% |
Lee et al. (Lee et al., 2016) | 800 | 98.71% |
HM2-BP (Jin et al., 2018) | 800 | 98.84 ± 0.02% |
This work | 800 | 98.880.02% |
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
Fast & Deep (Göltz et al., 2021) | 350 | 97.1 ± 0.1% |
Wunderlich & Pehle (Wunderlich & Pehle, 2021) | 350 | 97.6 ± 0.1% |
Alpha Synapses (Comsa et al., 2020) | 340 | 97.96% |
S4NN (Kheradpisheh & Masquelier, 2020) | 400 | 97.4 ± 0.2% |
BS4NN (Kheradpisheh et al., 2021) | 600 | 97.0% |
Mostafa (Mostafa, 2016) | 800 | 97.2% |
STDBP (Zhang et al., 2022) | 800 | 98.5% |
Fast & Deep (Göltz et al., 2021) | 800 | 97.830.08% |
(our implementation) | ||
Unconstrained | ||
eRBP (Neftci et al., 2017) | 2 × 500 | 97.98% |
Lee et al. (Lee et al., 2016) | 800 | 98.71% |
HM2-BP (Jin et al., 2018) | 800 | 98.84 ± 0.02% |
This work | 800 | 98.880.02% |
Note: Results for Fast & Deep and our method are highlighted in bold.
Appendix C: Fully Connected SNNs on EMNIST
Performance of Several Methods on the EMNIST Data Set.
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
Fast & Deep (Göltz et al., 2021) | 800 | 83.340.27% |
(our implementation) | ||
Unconstrained | ||
eRBP (Neftci et al., 2017) | 2 × 200 | 78.17% |
HM2-BP (Jin et al., 2018) | 2 × 200 | 84.31 ± 0.10% |
HM2-BP (Jin et al., 2018) | 800 | 85.41 ± 0.09% |
This work | 800 | 85.75 0.06% |
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
Fast & Deep (Göltz et al., 2021) | 800 | 83.340.27% |
(our implementation) | ||
Unconstrained | ||
eRBP (Neftci et al., 2017) | 2 × 200 | 78.17% |
HM2-BP (Jin et al., 2018) | 2 × 200 | 84.31 ± 0.10% |
HM2-BP (Jin et al., 2018) | 800 | 85.41 ± 0.09% |
This work | 800 | 85.75 0.06% |
Note: Results for Fast & Deep and our method are highlighted in bold.
Appendix D: Fully Connected SNNs on Fashion MNIST
Performances of Several Methods on the Fashion-MNIST Data Set.
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
S4NN (Kheradpisheh & Masquelier, 2020) | 1000 | 88.0% |
BS4NN (Kheradpisheh et al., 2021) | 1000 | 87.3% |
STDBP (Zhang et al., 2022) | 1000 | 88.1% |
Fast & Deep (Göltz et al., 2021) | 2 × 400 | 88.280.41% |
(our implementation) | ||
Unconstrained | ||
HM2-BP (Jin et al., 2018) | 2 × 400 | 88.99% |
TSSL-BP (Zhang & Li, 2020) | 2 × 400 | 89.75 ± 0.03% |
ST-RSBPa (Zhang & Li, 2019) | 2 × 400 | 90.00 ± 0.14% |
This work | 2 × 400 | 90.19 0.12% |
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
S4NN (Kheradpisheh & Masquelier, 2020) | 1000 | 88.0% |
BS4NN (Kheradpisheh et al., 2021) | 1000 | 87.3% |
STDBP (Zhang et al., 2022) | 1000 | 88.1% |
Fast & Deep (Göltz et al., 2021) | 2 × 400 | 88.280.41% |
(our implementation) | ||
Unconstrained | ||
HM2-BP (Jin et al., 2018) | 2 × 400 | 88.99% |
TSSL-BP (Zhang & Li, 2020) | 2 × 400 | 89.75 ± 0.03% |
ST-RSBPa (Zhang & Li, 2019) | 2 × 400 | 90.00 ± 0.14% |
This work | 2 × 400 | 90.19 0.12% |
Note: Results for Fast & Deep and our method are highlighted in bold. aThe trained model has recurrent connections.
Appendix E: Fully Connected SNNs on SHD
Performance of Several Methods on the Spiking Heidelberg Digits (SHD) Data Set.
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
Fast & Deep (Göltz et al., 2021) | 128 | 47.371.65% |
(our implementation) | ||
Unconstrained | ||
Cramer et al. (Cramer et al., 2022) | 128 | 48.10 ± 1.6% |
Cramer et al.a (Cramer et al., 2022) | 128 | 71.4 ± 1.9% |
This work | 128 | 66.79 0.66% |
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
Fast & Deep (Göltz et al., 2021) | 128 | 47.371.65% |
(our implementation) | ||
Unconstrained | ||
Cramer et al. (Cramer et al., 2022) | 128 | 48.10 ± 1.6% |
Cramer et al.a (Cramer et al., 2022) | 128 | 71.4 ± 1.9% |
This work | 128 | 66.79 0.66% |
Note: Results for Fast & Deep and our method are highlighted in bold. aThe trained model has recurrent connections.
Appendix F: Convolutional SNNs on MNIST
Network Architectures used in Table 7.
Network Name . | Architecture . |
---|---|
Net1 | 32C5-P2-16C5-P2-10 |
Net2 | 12C5-P2-64C5-P2-10 |
Net3 | 15C5-P2-40C5-P2-300-10 |
Net4 | 20C5-P2-50C5-P2-200-10 |
Net5 | 32C5-P2-32C5-P2-128-10 |
Net6 | 16C5-P2-32C5-P2-800-128-10 |
Network Name . | Architecture . |
---|---|
Net1 | 32C5-P2-16C5-P2-10 |
Net2 | 12C5-P2-64C5-P2-10 |
Net3 | 15C5-P2-40C5-P2-300-10 |
Net4 | 20C5-P2-50C5-P2-200-10 |
Net5 | 32C5-P2-32C5-P2-128-10 |
Net6 | 16C5-P2-32C5-P2-800-128-10 |
Note: 15C5 represents a convolution layer with 15 5 × 5 filters and P2 represents a 2 × 2 pooling layer.
Performance of Several Methods on the MNIST Data Set with Spiking Convolutional Neural Networks.
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
Zhou et al.a (Zhou et al., 2021) | Net1 | 99.33% |
STDBPa (Zhang et al., 2022) | Net6 | 99.4% |
Fast & Deep (our implementation) | Net3 | 99.220.05% |
Fast & Deepa (our implementation) | Net3 | 99.460.01% |
Unconstrainned | ||
Lee et al.a (Lee et al., 2016) | Net4 | 99.31% |
HM2-BPa (Jin et al., 2018) | Net3 | 99.42% ± 0.11% |
TSSL-BP (Zhang & Li, 2020) | Net3 | 99.50 ± 0.02% |
ST-RSBPa (Zhang & Li, 2019) | Net2 | 99.50 ± 0.03% |
ST-RSBPa (Zhang & Li, 2019) | Net3 | 99.57 ± 0.04% |
This work | Net3 | 99.380.04% |
This worka | Net3 | 99.60 0.03% |
Method . | Architecture . | Test Accuracy . |
---|---|---|
TTFS | ||
Zhou et al.a (Zhou et al., 2021) | Net1 | 99.33% |
STDBPa (Zhang et al., 2022) | Net6 | 99.4% |
Fast & Deep (our implementation) | Net3 | 99.220.05% |
Fast & Deepa (our implementation) | Net3 | 99.460.01% |
Unconstrainned | ||
Lee et al.a (Lee et al., 2016) | Net4 | 99.31% |
HM2-BPa (Jin et al., 2018) | Net3 | 99.42% ± 0.11% |
TSSL-BP (Zhang & Li, 2020) | Net3 | 99.50 ± 0.02% |
ST-RSBPa (Zhang & Li, 2019) | Net2 | 99.50 ± 0.03% |
ST-RSBPa (Zhang & Li, 2019) | Net3 | 99.57 ± 0.04% |
This work | Net3 | 99.380.04% |
This worka | Net3 | 99.60 0.03% |
Note: The network topologies are given in Table 6. aThe network has been trained using data augmentation.
Code Availability
The code produced in this work will be made available at: https://github.com/Florian-BACHO/bats