Abstract

Error backpropagation in networks of spiking neurons (SpikeProp) shows promise for the supervised learning of temporal patterns. However, its widespread use is hindered by its computational load and occasional convergence failures. In this letter, we show that the neuronal firing time equation at the core of SpikeProp can be solved analytically using the Lambert W function, offering a marked reduction in execution time over the step-based method used in the literature. Applying this analytical method to SpikeProp, we find that training time per epoch can be reduced by 12% to 56% under different experimental conditions. Finally, this work opens the way for further investigations of SpikeProp’s convergence behavior.

1  Introduction

One of SpikeProp’s features is the possibility of adapting the classical error backpropagation algorithm to networks of spiking neurons (Bohte, Kok, & La Poutré, 2000, 2002). Although improvements have been proposed (Schrauwen & Van Campenhout, 2004; McKennoch, Liu, & Bushnell, 2006; Ghosh-Dastidar & Adeli, 2007), SpikeProps’s running time and convergence are unsatisfactory in comparison to backpropagation in networks of sigmoid perceptrons (Fujita, Takase, Kita, & Hayashi, 2008; Takase et al., 2009; Thiruvarudchelvan, Crane, & Bossomaier, 2013; Thiruvarudchelvan, Moore, & Antolovich, 2013).

In a formal spiking neuron model, the neuron’s response to input stimuli is governed by the equation , the solutions of which are firing times of spikes (Gerstner, 1995). The function represents the neuron’s membrane potential at time , and is a constant threshold.

Under a response coding scheme called time-to-first-spike, a single spike is emitted at time . The single firing time is computed by evaluating sequentially with a fixed time step until obtaining a value for greater than or equal to . When no solution is found, the neuron emits no spike.

In this letter, we show that the firing time equation , analyzed piecewise, can be solved analytically using the Lambert W function.1 To motivate this novel result, we present a performance comparison of the analytical and step-based solutions, and we experiment with SpikeProp to determine how analytical performance affects the training time of spiking neural networks.

2  Analytical Solution of the Firing Time Equation

The membrane potential of a spiking neuron, receiving n postsynaptic potentials (PSPs) from antecedent neurons in a network, is expressed as
formula
2.1
with the PSP profile
formula
where is the weight of the kth PSP, is the onset time of the kth PSP, and is a decay constant. Without loss of generality, we now consider that since time can be measured relative to this constant.
For distinct ordered onset times , equation 2.1 becomes
formula
2.2
where and .

We now derive the analytical solution to the firing time equation using the principal branch of the Lambert W function, noted . We illustrate this function in Figure 1.

Figure 1:

The principal branch of the Lambert W function, noted and defined for , is the inverse of on the interval . We use only its negative part (solid line), defined for .

Figure 1:

The principal branch of the Lambert W function, noted and defined for , is the inverse of on the interval . We use only its negative part (solid line), defined for .

Lemma 1.
In time-to-first-spike coding, the firing time of a spiking neuron having a membrane potential described by equation 2.2, and a threshold , is given by
formula
where and are the coefficients corresponding to the first interval among where for some t.
Proof.

To solve analytically, we start from and search for the earliest interval during which reaches . Let us consider an interval , with taken as infinity for , for which we have . On Ik, the membrane potential has the first derivative with a single root at (note that this root might not be in Ik). Thus, if and , the function must increase on Ik, and there exists a solution to the firing time equation. In this case, the search ends with the interval . Otherwise, the function must have a maximum greater than or equal to on Ik in order to find a solution. This happens if and , and in this case, the search ends with the interval . If no solution is to be found on Ik, we continue the search with , for which .

formula

Algorithm 1 describes this process. Its outputs are the firing interval , with and , and the coefficients and , or the empty set if is never reached.

Note that we do not have to search for a solution when . First, if , we have , in which case the membrane potential is driven toward zero, thus precluding the neuron from firing. Second, Tk is the location of a maximum if , which is equivalent to such that Ak must be positive.

Finally, is the smallest solution to the equation . Using the substitutions and , and some algebra, we get the equation with the solution from which we obtain
formula
This solution requires that . The right bound comes from and . The left bound is derived by using and to find that .
Remark 1.

For a general value of the decay constant , the firing time is .

Remark 2.

If algorithm 1 returns the empty set, does not exist (it is infinite by definition of the infimum).

Remark 3.

The condition can be dropped. As per algorithm 1, cannot reach at iteration k with .

3  Analytical Performance and Step-Based Solutions

We now compare the performance of solutions implemented in Matlab R2015b 64 bits on Ubuntu 14.04 with a Pentium Dual-Core T4500 2.3 GHz processor.2 We generated 100,000 random instances of spiking neurons similar to those used in Bohte et al. (2000, 2002). The input stimuli are composed of five trains of 16 PSPs. Each PSP has a random weight between −1 and 2, the first PSP of each train has a random onset between 1 and 7, and subsequent onsets of each train are each delayed by a value of 1. We set the decay constant to 7. We computed the Lambert W function numerically with Halley’s method (Corless, Gonnet, Hare, Jeffrey, & Knuth, 1996). We measured the execution time required to evaluate the analytical solutions and the step-based solutions (with and 0.01) for the 100,000 neuron instances using the threshold values , and we computed the corresponding mean execution time per evaluation. The results are presented in Table 1.

Table 1:
Performance Comparison of Analytical and Step-Based Solutions.
151015
Analytical solution     
Step-based solution (    
Step-based solution (    
151015
Analytical solution     
Step-based solution (    
Step-based solution (    

Notes: Mean execution time (in seconds) over 100,000 instances (for and for ). Time was measured using the Matlab tic and toc commands.

Overall, the analytical solutions were evaluated faster than the step-based solutions (from 1.45 to 101.25 times faster depending on and ). We also found that occasionally the step-based method could miss the firing time found analytically as remained over for less than 0.1 units of time between two consecutive steps. When this happens, either the analytical solution is missed, as never rises again to reach , or there is a step-based solution that is at least one time step late. Table 2 lists the type of errors for each value of . Using prevented these errors, at a cost of more execution time. Note that was used most often in previous work with SpikeProp.

Table 2:
Errors of the Step-Based Method ().
151015
Missed analytical solution 
Late step-based solution 
151015
Missed analytical solution 
Late step-based solution 

4  Application to SpikeProp

4.1  Network Architecture and Weight Update Rules

We now describe the feedforward network architecture in SpikeProp (Bohte et al., 2000, 2002). Neurons in successive network layers are linked with connections comprising multiple synaptic terminals. A neuron in the network layer L fires a single action potential (or spike) when the membrane potential crosses a determined threshold. This spike generates postsynaptic potentials (PSPs) in the output connections of the neuron (one PSP per synaptic terminal). Figure 2 illustrates this network configuration. The membrane potential activity of each neuron in layer stems from the spatial and temporal summation of incoming PSPs from layer L, following equation 2.1. As before, we assume that the PSPs of a connection are each delayed by one unit of time. We also assume that, following a spike, the onset of the first PSP is delayed by one unit of time. Furthermore, the decay constant is usually set to the same value for every neuron in the network.

Figure 2:

Network connections. Neuron N1 on layer L is connected to neurons N2 and N3 on layer . Each connection displays five synaptic terminals. Note that we use 16 terminals in our experiments.

Figure 2:

Network connections. Neuron N1 on layer L is connected to neurons N2 and N3 on layer . Each connection displays five synaptic terminals. Note that we use 16 terminals in our experiments.

We now assume that the connections in the network lead to a final layer comprising a single output neuron providing a firing time in response to the network’s input stimuli. Such a network can be trained to fire a spike at a desired time by modifying its weights according to the firing time actually observed, . Suppose that there are M neurons in the layer connected to the output neuron. In that layer, the update rule for the kth weight () connecting the jth neuron (with firing time ) to the output neuron is given by
formula
4.1
with
formula
4.2
where a step size hyperparameter.
This rule can be extended to any layer L. Suppose layer has M neurons and layer has M1 neurons. The update rule for the kth weight connecting the ith neuron of layer to the jth neuron of layer L is given by
formula
4.3
with
formula
4.4
where the terms are computed recursively from the last layers of the neuron toward the first layers. When the layer L is the layer connected to the output neuron, we have and .
In a binary classification problem, different input stimuli patterns are to be assigned to one of two classes by the network. To train the network to discriminate between classes, we start by assigning distinct target firing times to each class, and we initialize the network with random weights. Then we present the input patterns to the network in random order, and we modify the weights after each pattern using the update rules described by equations 4.1 to 4.4. After an epoch (a presentation of all patterns), we evaluate the network’s error on all patterns at once: either we stop training the network if the error is satisfactory, or we initiate another epoch if the error is too high. For a problem with C patterns, the error is expressed by the sum-of-squares
formula
where and are, respectively, the desired and actual firing time for the cth pattern.

4.2  XOR Problem

We now test the analytical solution in SpikeProp applied to the modified XOR problem described in Table 3 (see Bohte et al., 2000, 2002). We use the same computing environment as in section 3.

Table 3:
Temporal Encoding of the XOR Problem.
Input 1Input 2Input 3ClassFiring Time
16 
10 
10 
16 
Input 1Input 2Input 3ClassFiring Time
16 
10 
10 
16 

Notes: Input values correspond to the firing times of spikes that generate PSPs in the first layer of the network. Inputs 1 and 2 can represent an early spike fired at time 0 or a late spike fired at time 6. Input 3 is a reference spike always fired at time 0. Class 1 represents an early response of the network’s output neuron, and class 0 represents a late response.

To compare analytical and step-based solutions, we generated 10 spiking neural network instances having 3 inputs, 3 hidden neurons, and 1 output neuron, plus 10 additional instances with 5 hidden neurons. All network weights were initialized to a random value between −1 and 2. We used , , and . A low value is used for so that, initially, it is very likely that all neurons fire for all input patterns; this was found to facilitate the training process in previous work with SpikeProp. We trained each network with SpikeProp using the values 0.01,0.05,0.10 for the hyperparameter until the fell under a value of 1, or until we reached a maximum of 20,000 epochs. We call a run of SpikeProp convergent if it ended with in fewer than 20,000 epochs (otherwise it is nonconvergent).

For each run, we measured the number of epochs and the training time, and we computed the corresponding means for each experimental condition (determined by solution method, number of hidden neurons, and value of ). We also counted the number of convergent runs and derived the training time per epoch. The results are presented in Tables 4 and 5. Note that we excluded nonconvergent runs when computing mean training epochs and mean training time. Over all 120 runs, the largest number of epochs for a convergent run was 9240.

Table 4:
Performance Comparison of Analytical and Step-Based Solutions for SpikeProp (Three Hidden Neurons).
0.010.050.10
Analytical solution 332 epochs 1979 epochs 291 epochs 
 4.1 s 24.1 s 3.6 s 
 10 convergent runs 10 convergent runs 8 convergent runs 
 12.3 ms/epoch 12.2 ms/epoch 11.8 ms/epoch 
Step-based solution 744 epochs 1993 epochs 597 epochs 
 11.2 s 31.5 s 9.6 s 
 10 convergent runs 9 convergent runs 7 convergent runs 
 15.0 ms/epoch 18.3 ms/epoch 26.5 ms/epoch 
0.010.050.10
Analytical solution 332 epochs 1979 epochs 291 epochs 
 4.1 s 24.1 s 3.6 s 
 10 convergent runs 10 convergent runs 8 convergent runs 
 12.3 ms/epoch 12.2 ms/epoch 11.8 ms/epoch 
Step-based solution 744 epochs 1993 epochs 597 epochs 
 11.2 s 31.5 s 9.6 s 
 10 convergent runs 9 convergent runs 7 convergent runs 
 15.0 ms/epoch 18.3 ms/epoch 26.5 ms/epoch 

Notes: Each cell presents the mean number of training epochs and mean training time (in seconds) over convergent runs, number of convergent runs, and mean training time per epoch (in milliseconds) over all 10 runs. Time was measured using the Matlab and commands.

Table 5:
Performance Comparison of Analytical and Step-Based Solutions for SpikeProp (Five Hidden Neurons).
0.010.050.10
Analytical solution 312 epochs 154 epochs 192 epochs 
 6.2 s 3.1 s 3.8 s 
 10 convergent runs 10 convergent runs 10 convergent runs 
 19.9 ms/epoch 19.9 ms/epoch 19.9 ms/epoch 
Step-based solution 334 epochs 125 epochs 394 epochs 
 7.6 s 2.9 s 10.8 s 
 10 convergent runs 10 convergent runs 10 convergent runs 
 22.7 ms/epoch 23.0 ms/epoch 27.3 ms/epoch 
0.010.050.10
Analytical solution 312 epochs 154 epochs 192 epochs 
 6.2 s 3.1 s 3.8 s 
 10 convergent runs 10 convergent runs 10 convergent runs 
 19.9 ms/epoch 19.9 ms/epoch 19.9 ms/epoch 
Step-based solution 334 epochs 125 epochs 394 epochs 
 7.6 s 2.9 s 10.8 s 
 10 convergent runs 10 convergent runs 10 convergent runs 
 22.7 ms/epoch 23.0 ms/epoch 27.3 ms/epoch 

Notes: Mean number of training epochs and mean training time (in seconds) over convergent runs, number of convergent runs, and mean training time per epoch (in milliseconds) over all 10 runs. Time was measured using the Matlab and commands.

Our results show that the hyperparameter and the number of hidden neurons can have much more impact on the training process than the solution method. With one exception (five hidden neurons, ), training took less time and fewer epochs on average with the analytical solution. For specific network instances, using the analytical solution instead of the step-based solution does not guarantee a reduction in the number of epochs and training time. Still, we obtained a 12% to 56% reduction of training time per epoch across different conditions (see Table 6). Note that we observed no occurrence of missed analytical solutions or late step-based solutions while training these networks.

Table 6:
Relative Reduction in Training Time per Epoch for the Analytical Solution Method in Different Conditions.
0.010.050.10
Three hidden neurons    
Five hidden neurons    
0.010.050.10
Three hidden neurons    
Five hidden neurons    

Finally, we obtained some convergence failures with networks having three hidden neurons. However, with no errors in step-based solutions and with nonconvergent runs involving both analytical and step-based solutions, there is no indication that step-based solutions cause additional convergence failures.

5  Discussion

5.1  Scope of the Analytical Solution

Our results, while promising, are limited by the small scale of our experiments. A model of how analytical performance scales up with PSP count and network size could help inform the use of our method on large-scale problems. However, we found that the derivation of such a model is challenging due to the firing process of spiking neurons that affects the PSP count in a network. Approximations of the firing process, such as those used in Thiruvarudchelvan, Crane, et al. (2013), could help toward that goal.

It is of interest to extend our analytical solution to other PSP profiles and to multispiking neural networks that add exponential terms to the membrane potential to represent refractoriness (Booij & Nguyen, 2005; Ghosh-Dastidar & Adeli, 2009). Changing the firing time equation can preclude its solution via the Lambert W function. Still, this function is not elementary, and a numerical method is required for its evaluation. Halley’s method could very well be adapted to more general forms of membrane potential. However, when possible, it is desirable to solve the firing time equation via special functions because of their well-understood behavior, which facilitates their use.

Furthermore, it would be worthwhile to adapt our analytical solution to recurrent spiking neural networks. In such networks, the onset times of PSPs are not known in advance, and our method should be modified to compute firing times simultaneously in all neurons with a list of known onset times that could be appended as new spikes are fired.

5.2  Significance of the Analytical Performance

On average, analytical solutions helped to reduce the training time per epoch comparatively to step-based solutions under several experimental conditions. Those conditions can have a higher impact on performance than the solution method itself, but identifying optimal conditions is a time-consuming process that is largely problem dependent. Analytical solutions could contribute to speed up this process, and further experiments are needed to confirm their performance across different problems. Also, for large-scale spiking neural networks, parallel implementations should be employed as well to further improve training time (Nuno-Maganda, Arias-Estrada, Torres-Huitzil, & Girau, 2009).

We found that solving the firing time equation is subject to errors with the step-based method. These errors are very rare, and they did not occur during our experiment with SpikeProp. Their impact on convergence remains uncertain, which warrants further study of the properties of SpikeProp.

6  Conclusion

In this letter, we have shown that the firing time of spiking neurons in SpikeProp can be determined analytically via the Lambert W function. This analytical solution offers better performance than the step-based solution employed in the literature. In certain situations, it can even correct erroneous results of the step-based solution. Furthermore, we showed that the analytical solution reduced the training time per epoch of SpikeProp.

Further work involves the development of a model of analytical performance scale-up to improve our understanding of analytical and step-based solutions, the extension of our analytical method to other variants of spiking neural networks and to larger scales to solve real-world classification problems, and additional experimentation to characterize the convergence behavior of SpikeProp.

Acknowledgments

This research is supported by the Bill and Melinda Gates Foundation, award OPP1110049. We are grateful to the reviewers for their constructive comments, which helped improve this work.

References

Barry
,
D. A.
,
Parlange
,
J.-Y.
,
Li
,
L.
,
Prommer
,
H.
,
Cunningham
,
C. J.
, &
Stagnitti
,
F.
, (
2000
).
Analytical approximations for real values of the Lambert-W function
.
Mathematics and Computers in Simulation
,
53
,
95
103
.
Bohte
,
S. M.
,
Kok
,
J. N.
, &
La Poutré
,
H.
(
2000
).
Spike-prop: Error-backpropagation in multi-layer networks of spiking neurons
. In
Proceedings of the 8th European Symposium on Artificial Neural Networks
(pp.
419
425
).
Bohte
,
S. M.
,
Kok
,
J. N.
, &
La Poutré
,
H.
(
2002
).
Error-backpropagation in temporally encoded networks of spiking neurons
.
Neurocomputing
,
48
,
17
37
.
Booij
,
O.
, &
Nguyen
,
H. T.
(
2005
).
A gradient descent rule for spiking neurons emitting multiple spikes
.
Information Processing Letters
,
95
(
6
),
552
558
.
Corless
,
R. M.
,
Gonnet
,
G. H.
,
Hare
,
D. E. G.
,
Jeffrey
,
D. J.
, &
Knuth
,
D. E.
(
1996
).
On the Lambert W function
.
Advances in Computational Mathematics
,
5
,
329
359
.
Fujita
,
M.
,
Takase
,
H.
,
Kita
,
H.
, &
Hayashi
,
T.
(
2008
).
Shape of error surfaces in SpikeProp
. In
Proceedings of the 2008 International Joint Conference on Neural Networks
(pp.
840
844
).
Piscataway, NJ
:
IEEE
.
Gerstner
,
W.
(
1995
).
Time structure of the activity in neural network models
.
Physical Review E
,
51
,
738
758
.
Ghosh-Dastidar
,
S.
, &
Adeli
,
H.
(
2007
).
Improved spiking neural networks for EEG classification and epilepsy and seizure detection
.
Integrated Computer-Aided Engineering
,
14
(
3
),
187
212
.
Ghosh-Dastidar
,
S.
, &
Adeli
,
H.
(
2009
).
A new supervised learning algorithm for multiple spiking neural networks with application in epilepsy and seizure detection
.
Neural Networks
,
22
(
10
),
1419
1431
.
McKennoch
,
S.
,
Liu
,
D.
, &
Bushnell
,
L. G.
(
2006
).
Fast modifications to the SpikeProp algorithm
. In
Proceedings of the 2006 International Joint Conference on Neural Networks
(pp.
3970
3977
).
Piscataway, NJ
:
IEEE
.
Nuno-Maganda
,
M. A.
,
Arias-Estrada
,
M.
,
Torres-Huitzil
,
C.
, &
Girau
,
N.
(
2009
).
Hardware implementation of spiking neural network classifiers based on backpropagation-based learning algorithms
. In
Proceedings of the 2009 International Joint Conference on Neural Networks
(pp.
2294
2301
).
Piscataway, NJ
:
IEEE
.
Schrauwen
,
B.
, &
Van Campenhout
,
J.
(
2004
).
Extending SpikeProp
. In
Proceedings of the 2004 International Joint Conference on Neural Networks
(pp.
471
475
).
Piscataway, NJ
:
IEEE
.
Takase
,
H.
,
Fujita
,
M.
,
Kawanaka
,
H.
,
Tsuruoka
,
S.
,
Kita
,
H.
&
Hayashi
,
T.
(
2009
).
Obstacle to training SpikeProp networks: Cause of surges in training process
. In
Proceedings of the 2009 International Joint Conference on Neural Networks
(pp.
3062
3066
).
Piscataway, NJ
:
IEEE
.
Thiruvarudchelvan
,
V.
,
Crane
,
J. W.
, &
Bossomaier
,
T.
(
2013
).
Analysis of SpikeProp convergence with alternative spike response functions
. In
Proceedings of the 2013 IEEE Symposium on Foundation of Computational Intelligence
(pp.
98
105
).
Piscataway, NJ
:
IEEE
.
Thiruvarudchelvan
,
V.
,
Moore
,
W.
, &
Antolovich
,
M.
(
2013
).
Improving the efficiency of spiking network learning
. In
Proceedings of the 20th International Conference on Neural Information Processing
(pp.
172
179
).
Berlin
:
Springer
.

Notes

1

This special function is often available in numerical analysis software, and analytical approximations also exist (Barry et al., 2000).

2

We developed our own Matlab scripts and functions, with no particular run-time optimizations.