## Abstract

The multispike tempotron (MST) is a powersul, single spiking neuron model that can solve complex supervised classification tasks. It is also internally complex, computationally expensive to evaluate, and unsuitable for neuromorphic hardware. Here we aim to understand whether it is possible to simplify the MST model while retaining its ability to learn and process information. To this end, we introduce a family of generalized neuron models (GNMs) that are a special case of the spike response model and much simpler and cheaper to simulate than the MST. We find that over a wide range of parameters, the GNM can learn at least as well as the MST does.

We identify the temporal autocorrelation of the membrane potential as the most important ingredient of the GNM that enables it to classify multiple spatiotemporal patterns. We also interpret the GNM as a chemical system, thus conceptually bridging computation by neural networks with molecular information processing. We conclude the letter by proposing alternative training approaches for the GNM, including error trace learning and error backpropagation.

## 1 Introduction

Spiking neurons have been shown to be computationally more powerful than standard rate-coded neurons (Maass, 1997; Gütig & Sompolinsky, 2006; Rubin, Monasson, & Sompolinsky, 2010). This motivates the hope that networks of spiking neurons (SNNs) can be built smaller than corresponding networks of rate-coded neurons while achieving the same computational task. This would be interesting because current deep architecture, while solving sophisticated tasks, also requires extremely large models. Networks requiring up to a billion of weights are not uncommon—for example, in language processing (Radford et al., 2018) or image recognition (Mahajan et al., 2018).

In practice, there are difficulties, however. Spiking neurons are more expensive to simulate than nonspiking units, which may outweigh any gains from the reduced network size. In some cases, this problem may be circumvented by using specialized neuromorphic hardware (Indiveri et al., 2011; Plana et al., 2011; Lin et al., 2018; Shahsavari, Devienne, & Boulet, 2019; Rajendran & Alibart, 2016) to model SNNs. However, this is not always practical or possible (Wunderlich et al., 2019).

In most cases, it will remain necessary to simulate SNNs on general-purpose computers. In order to be able to build efficient SNNs, we therefore need to understand systematically the computational properties of spiking neurons. Indeed, there exists a large number of variants of spiking models vastly varying in internal complexity (Hodgkin & Huxley, 1952; Izhikevich, 2003; Brunel & van Rossum, 2007; Gerstner, Kistler, Naud, & Paninski, 2014). These various neuronal models are not always developed with computational efficiency as a criterion. Particularly in computational neuroscience, considerations of (the rather vague concept of) biological plausibility are often more important than pure simplicity. Yet in the context of applications of spiking neurons in artificial intelligence (AI), biological plausibility is irrelevant, and the important criteria should be computational cost, ease of implementation, suitability for neuromorphic hardware, and performance. In order to be able to produce maximally parsimonious models, it is essential to first understand the necessary ingredients that enable computation. This will be the focus of this letter.

Concretely, here we investigate the minimal ingredients required for a single neuron to perform a multilabel classification task. The purpose of this task is to classify incoming spatiotemporal patterns into different classes and distinguish them from noise. In the SNN literature, there have been a number of attempts to solve variants of this task, including the remote supervised method (ReSuMe; Ponulak, 2005), the chronotron learning rule (Florian, 2012), or the spike pattern association neuron (Mohemmed, Schliebs, Matsuda, & Kasabov, 2012), and the precise spike-driven synaptic plasticity (Yu, Tang, Tan, & Li, 2013). For this letter, we focus on one of the most recent approaches: the multispike tempotron (MST; Gütig, 2016). This is a single neuron architecture that can be trained to distinguish fixed spatiotemporal patterns from (statistically indistinguishable) noise. Crucially, the MST can also classify patterns into different classes, where the class label is indicated by the number of output spikes released during the duration of the pattern. Noise is always considered as class 0—that is, the MST should not spike when presented with noise.

The neural dynamics of the MST can be summarized as follows: (1) The state of the neuron is defined by the value of the internal membrane potential $V(t)$. It is updated in discrete time. Spikes are generated when $V(t)$ crosses a set threshold value from below. (2) The inputs to the MST neuron are $N$ (unmodelled) presynaptic neurons, spiking with a set frequency. (3) The MST does not accept directly spiking input, but it includes a preprocessing step whereby input spikes are converted into analog signals via a biexponential synapse. This could be interpreted as an additional layer of trivial, capacitor-like buffer neurons. (4) The membrane potential update rule of the MST takes into account the total spiking history of the presynaptic neurons. As a result, the future evolution of the neuron does not solely depend on the current state and inputs; it also depends on how the current state was reached. The update function also includes a constant decay of the membrane potential and a “soft” exponential reset following a spike. Unlike, for example, the leaky integrate-and-fire (LIF) neuron, the MST does not have an immediate hard reset to the resting potential or a refractory period.

In summary, while the MST is a powerful neuronal model, it is also internally complex. The question we address here is whether this internal complexity is necessary for the ability of the MST to learn. We find that it is not. Much simpler models can learn at least equally well. To show this, we systematically strip away features from the MST and check whether this has an impact on its ability to learn. To this end, we propose a family of generalized neuronal models (GNM), which is a special case of the well-known spike-response model (Gerstner & Kistler, 2002a; Jolivet, Lewis, & Gerstner, 2003). The GNM contains a number of readily interpretable parameters that we vary systematically in order to explore putatively crucial features of the model. Depending on how the parameters are set, we can approximate the MST model or implement radically simpler models. The most important parameters that we shall find are the spikiness of the GNM and its memory.

Using a rigorous exploration of the GNM parameter space, we find that most of the complexities of the MST neuron are not essential for learning. Indeed, there is no strict need for spiking, and is the soft reset is not important. However, we do find that a balanced amount of memory of past states—that is, a degree of temporal autocorrelation of the membrane potential—is crucial for learning. Interestingly, we identify the hard reset of the well-known LIF neuron (Gerstner et al., 2014), which erases any memory of prespike states, destroying the correlation of postspike and prespike membrane potentials, as a hindrance to learning in the single neuron model.

Following common practice in SNN, we assume that the GNM is updated in discrete time. However, we also find that a continuous time version of the GNM can classify well. This continuous time version is theoretically interesting because it lends itself to an interpretation as a chemical reaction network (CRN). We conclude our letter by showing how a very simple chemical system can be trained and used to perform multilabel classification.

## 2 Methods

### 2.1 The Generalized Neuron Model

When $\eta >0$, an additional time-dependent decay rate $R$ becomes relevant. $R$ can be thought of as describing the number of ion channels that open only after the membrane potential approaches the threshold $\u03d1B$ and close, stochastically, with a rate $\beta $. Alternatively, this can be understood as the simplest model that implements a soft postspike reset.

At the start of a simulation, $R(0)$ will be set to 0. The subsequent increase of $R(t)$ depends on the membrane potential via a Hill-function (the first term on the right-hand side of equation 2.1a), which is a sigmoidal activation function. As $h\u2192\u221e$, the Hill function approaches a step–function with a transition at the point $V=\u03d1B$. Even for finite values of $h$, the Hill function will be close to zero (one) when the membrane potential $V(t)$ is below (above) $\u03d1B$. Note that the decay rate $R$ decays itself with a rate of $\beta $.

The effect of this additional decay mode is that the membrane potential may decay faster after having crossed a threshold value $\u03d1B$. This introduces a memory about past spike events into the model. The duration of the reset depends on the value of $\beta $ and continues even if the membrane potential falls back below the threshold. Thus, the model has hysteresis (see Figure 1d).

Before continuing, it is useful to discuss briefly the relation between the GNM model and other well-known neuronal models. Unlike the MST, the update rule of the GNM is purely state dependent. The $\eta $ parameter can be seen as regulating the “spikiness” of the model. The soft reset of the MST can be simulated in the GNM when $\eta >0$ and the values of the parameters $\beta ,\zeta $ are set appropriately. The well-known LIF neuron behaves like the GNM model with $h=\u221e,\zeta =1,\gamma =1/\eta $, and $0<\eta <1$, up to reaching the behavioral threshold $\u03d1B$, including the postspike reset. However, following the reset, the LIF undergoes a deterministic refractory period, during which it remains insensitive to inputs. This type of fixed refractory period cannot be simulated by the GNM.

### 2.2 Quantifying “Spikes”

In the discrete time version of the GNM, “spikes” are determined by counting how often the membrane potential $V(t)$ (or decay $D(t)$) crosses the readout threshold $\u03d1R$ from below. In multilabel classification, this value indicates class membership. In the case of a single pattern, we require it to cross the threshold exactly once. The error is then simply the difference between a target number of output spikes and the actual number of spikes during $M$ time bins of a pattern.

### 2.3 Training Algorithms

We use three different training algorithms to compare the GNM with the MST. First, we use a version of the eligibility-based ALL algorithm proposed by Gütig (2016), adapted to the GNM. We also show that there are alternative algorithms that can be applied to the GNM and provide better performance.

#### 2.3.1 Aggregate-Label Learning (ALL)

#### 2.3.2 Error Trace Learning (ET)

#### 2.3.3 Error Trace Backpropagation

### 2.4 Momentum Heuristic

## 3 Results

### 3.1 Aggregate Label Learning in the GNM

We first tested how well the discrete GNM can learn a single spatiotemporal pattern. Such a pattern is a temporal sequence of $M$ binary strings of length $N$. Throughout this letter, we have kept $N=100$ and $M=50$. Patterns were generated randomly by drawing each of the bits from a Bernoulli distribution with $p(1)=0.005$. In addition to randomly generated but fixed patterns, we expose the neuron to a stream of noisy background activity. The random activity is generated in the same way as the patterns, but unlike it, the noise is produced at each time bin. As a consequence, the statical properties of noise and pattern are identical in this setup.

The first task we set is as follows: GNM should respond with exactly one spike if the input is a pattern and should stay inactive otherwise (i.e., if presented with noise). Unlike the MST, the GNM does not have discrete output spikes. In order to interpret the output of the GNM, we thus need to set an (arbitrary) readout threshold value $\u03d1R$. The response of the GNM is determined by the number of times membrane potential $V(t)$ (or decay $D(t)$) crosses $\u03d1R$ from below within the duration of the pattern (see Figure 1b). This number is used to indicate class membership. In the case of a single pattern, we require it to cross the threshold exactly once.

We train the neuron using the ALL algorithm (see section 2.3.1). We first test the performance of the GNM on the task of learning a single pattern. Here, the neuron is presented with a random number of spatiotemporal patterns embedded into noisy background activity at random times. At the end of a trial, the GNM receives feedback indicating whether it has released too many or too few spikes. In all experiments, we use the following parameters: $\beta =0.3$, $\zeta =1$, $\gamma =1$, which allow the neuron to exhibit a postspike reset closely resembling that of the MST. For each target pattern, we sampled 41 different values of both $\alpha $ and $\eta $ parameters (altogether 1681 parameter combinations) in order to test the performance of the GNM. We varied systematically the model choice parameter $\eta $ from 0 (no spikiness) to 1 (complete spikiness) and the decay rate $\alpha $ from 0 (complete memory) to 1 (no memory). For each combination of parameters, we trained the GNM over 60,000 epochs (trials consisting of a random number of patterns embedded into randomly generated noisy background activity) with a learning rate $\lambda =0.0001$.

In order to determine the quality of learning, we subjected the trained GNM to a stream of noise with randomly interspersed target patterns. If the GMM is working correctly, it should respond not to the noise but to the pattern. In practice, GNMs will not function perfectly. In order to quantify the classification reliability of the neuron, we recorded the number of random inputs given to the GNM before the GNM failed and averaged this number over 100 repetitions. We henceforth refer to this as the *noisy performance measure* and use it as an indicator of the quality of the GNM solution. Here, a higher noisy performance is better.

For $\alpha ,\eta \u22480$, corresponding to the top left corner, noisy performance is low (i.e., the GNM does not classify well). The reason for the poor classification can be understood easily. In this region, the decay of the membrane potential $V(t)$ is low and the GNM integrates over all past events. The membrane potential remains in a permanent superthreshold state and thus is unable to cross the threshold $\u03d1R$ from below (or indeed from above).

Allowing some leak by increasing $\alpha $ while keeping $\eta $ at 0 (i.e., going down the left-most column in Figure 4) improves the performance dramatically. For example, $\alpha $ is adjusted from 0.05 to just 0.08, the noisy performance increases from approximately 0 to the global best. As $\alpha $ approaches 1, the performance decreases again. Therefore, for the subfamily of GNM models with $\eta =0$, there must be a value of $\alpha $ that optimizes the learning, although this optimum is not well resolved. Note that in region $\eta =0$, which we have considered so far, the neural dynamics is reduced to $Vi(t+1)=V(t)+I-\alpha Vi(t)$. As such, it lacks entirely the features that are usually associated with spiking neurons, including discrete spikes and an activation threshold.

A behavioral threshold $\u03d1B$ is introduced to the GNM by increasing the spikiness parameter $\eta $. Figure 4 reveals that high performance of the model concentrates along a fuzzy line of combinations of $\alpha $ and $\eta $; we henceforth refer to this as the *optimal line* (see Figure 4b). However, performance along this line does not significantly increase for $\eta >0$ relative to the nonspiking case of $\eta =0$. Indeed, we can see that the GNM performance drops for extreme values of $\eta \u22481$. In this regime, the threshold dynamics dominates, and the membrane potential is constrained to a small range of values, making learning impossible. Based on this, we conclude that at least for the case of a single pattern, introducing spikiness does not bring any benefits. Globally best performance can be achieved for $\eta =0$ at $\alpha \u22430.3$.

There is also an appealing conjecture for the origins of the optimal line. We observe from equation 2.1a that the decay is effectively reduced by $(1-\eta )$. The optimal line can then be interpreted as a consequence of there being an optimal value for the parameter $\alpha $. To see this, assume that this optimal value is given by $\alpha =\alpha *$. Assume further that the actual value of $\alpha $ is set to $\alpha '>\alpha *$. A suitable choice of $\eta $ satisfying $(1-\eta )=\alpha */\alpha '$ can effectively offset the nonoptimal choice of $\alpha $ back to the optimal value. If that is true, it would generate precisely the observed optimal line in the parameter space portrait. Beyond this correction of the decay parameter, an increased spikiness has no apparent benefit.

### 3.2 Multipattern Learning in the GNM

So far, we have tested only how the GNM learns a single pattern. The key achievement of the MST is that a single neuron can learn to recognize multiple patterns and multiple classes of patterns. For example, there may be a set of patterns to which the MST responds with one spike and a set of patterns to which it responds with two spikes and so on. We now test whether the GNM can do the same. Similar to before, we interpret the GNM as “spiking” $n$ times if during the presentation of the pattern, the membrane potential crosses the readout threshold $\u03d1R$ from below $n$ times.

### 3.3 Comparison to Other Training Methods and Neural Models

We have found that the GNM is competitive with the MST. However, the comparison is unfair because we compared a large number of simulations to just a single parameter setting of the MST. We now investigate the performance of the GNM relative to other models with more rigor.

In addition to comparing the GNM to other neural models trained using the same algorithm, we also contrasted the performance of the original ALL algorithm with training techniques involving more information about the time of the error (i.e., error trace learning; see section 2.3.2). We find that the error trace learning algorithm has consistently outperformed ALL (see Figure 7). Out of 50 training simulations that we conducted (for a GNM neuron with parameters $\eta =0$, $\alpha =0.3$), 46 achieved a better result in terms of noisy performance when using the ET method. This tells us that despite the fact that the ALL is an elegant and simple training rule, it is also suboptimal.

Moreover, we propose an extension of the ET rule to setups of multilayer networks. The error backpropagation algorithm (see section 2.3.3) can be applied directly to the GNM (see Figure 7). However, for the present task, we could not find any benefits in applying backpropagation. This is not to say that for more complex problems, backpropagation may be beneficial in SNNs. Exploring this is beyond the scope of this letter, and we leave it to future research.

### 3.4 Interpreting GNM as a System of Chemical Reactions

A common assumption in the SNN literature, including the MST, is that the input channels are clocked that is, the model is updated in discrete time. It is straightforward to extend the GNM model to the continuous case (corresponding to equation 2.1c). We found that training the model in continuous time yielded qualitatively the same results as the discrete time case. We demonstrate the feasibility of this interpretation by training a continuous-time version of the GNM to recognize two classes of patterns (see Figure 2). The extension to the continuous case is interesting because then the GNM model with $\eta =0$ (see equation 2.1d) can be interpreted as the description of a molecular species $V$ that decays with a rate of $\alpha $. The input $I$ is then mathematically equivalent to $N$ different input chemical species $Ii$, each decaying to $V$ with a rate of $wiC$ and to a null species with a rate of $(1-wi)C$. In this case, the constant $C$ sets the time scale of the decay and could be the same for all presynaptic neurons. Having interpreted the GNM as a chemical system, we can then test its ability to recognize patterns by solving the differential equation 2.1d.

## 4 Discussion

In this letter, we have probed the minimal ingredients necessary for neural computation in the context of multilabel classification of spatiotemporal patterns. We introduced the GNM, which can solve multilabel classification tasks at least as well as the MST, while being purely state based. The model also has a conservation of membrane potential built in. This does not preclude leakage of membrane potential, but it prevents its creation out of nothing. In that sense, the model is physically plausible, which allows it to be interpreted in terms of concrete implementations (see below).

The dynamics of the GNM are simple and its parameters easily interpretable, which supports an intuitive understanding of what precisely it is that makes single neuron classification work. The parameter $\alpha $ can be interpreted as a memory. It determines how much membrane potential is leaked between two update steps. In the extreme case of $\alpha =1$, the neuron is reset during each time step and has no memory of past inputs. In this limit, the GNM is reduced to a standard rate-coding neuron. The performance of the GNM is substantially decreased, but some learning is still possible. For $\alpha =0$, the neuron integrates over all past events and never forgets. It is clear from both basic considerations and our simulation results that this latter limit does not allow the GNM to recognize patterns. In between those two extremes is an optimal value for the memory of the GNM. Figures 4c and 6a suggest that the model is not particularly sensitive to the memory parameter, at least not for intermediate values. Interestingly, however, our simulations also suggest that at the lower end of the parameter range, there is a critical value of $\alpha $ that separates almost perfect ability to learn from complete nonperformance.

The second key parameter is the model choice parameter $\eta $. It controls the extent to which the neural dynamics is affected by an internal threshold and hysteresis—in short, how much spikiness the neural dynamics exhibits. For $\eta =0$, the internal dynamics is a simple exponential decay with time constant $\alpha $. No thresholds are defined internally, and there is no spiking whatsoever. Note, however, that the use of the neuron still requires an evaluation threshold $\u03d1R$ to be set in order to be able to interpret the membrane potential of the GNM as indicating the pattern class.

Increasing $\eta $ introduces an additional behavioral threshold parameter, $\u03d1B$, which now *does* have an impact on the internal dynamics of the GNM. As the membrane potential nears $\u03d1B$, an additional decay term $R$ becomes relevant, such that after crossing the threshold, the decay may be higher than before crossing the threshold. This endows the GNM a spikiness and, most of all, with a time-limited memory of past spiking events

We found that model performance was consistently best along an off-center diagonal in the lower left quadrant of the heat maps (see Figures 4 and 5). Crucially, however, there is no consistent evidence for an optimal point along this diagonal. The conclusion to draw from this is that it is sufficient to consider the reduced parameter space corresponding to $\eta =0$—the case of no spikes. Put differently, there does not appear to be any benefit in spiking.

The existence of the optimal line provides some insights into the necessary ingredients for spiking networks in that it points to the memory parameter $\alpha $ as the main determinant of performance. The optimal line, albeit not very well defined in our models, is the line of constant memory, because the factor $(1-\eta )$ effectively reduces the memory parameter $\alpha $. Therefore, it appears from our simulations that there is an optimal memory for the performance of the GNM which lies in between the extreme and nonperforming cases of $\alpha =0$ and $\alpha =1$ corresponding to no forgetting and no memory at all. However, note that the model is not particularly sensitive to the precise value of $\alpha $, such that there is a range of values for which performance is good.

The conclusion that the GNM can perform with no spiking opens an interesting perspective. For $\eta =0$, the GNM model, equation 2.1d, looks formally like the time evolution of the concentration $V$ of a molecular species subject to decay, $V\u27f6\alpha \u2300$, plus occasional instantaneous increases of $V$, $V\u27f6kV$. Fundamentally, such a chemical system is an extremely simple system that can be implemented easily in (wet) experiments. Yet as we show, this simple system is sufficiently rich in its dynamics in order to perform the multilabel classification as well as the specialized MST model. All the chemical system retains in common with the MST is the temporal correlation of the input. This leads us to conjecture that this temporal autocorrelation is a crucial element for multilabel classification of spatiotemporal patterns.

This formal equivalence of GNM and chemical systems begs the question whether there actually are man-made or naturally occurring chemical systems that recognise spatiotemporal patterns. An obvious place to look for such systems are biochemical networks. It is conceivable that multilabel classification is exploited by gene regulatory networks to control gene expression by means of sequences of gene expression events.

Having shown that very simple systems can perform multilabel classification, it is instructive to compare the GNM and MST to another very simple model of a spiking neuron: the LIF neuron. The LIF neuron is different from both the GNM and the MST in that it has a hard reset following a spike and typically undergoes a refractory period. The refractory period, together with lateral inhibition, is a useful feature in the context of STDP learning (Feldman, 2012; Gerstner & Kistler, 2002b), which can be used to facilitate a winner-takes-all dynamics in multilayer SNN networks, which in turn is important to prevent all postsynaptic neurons from learning the same parts of the input. Beyond that, it is unclear whether there is a computational benefit in the refractory period.

Our simulations showed (see Figure 8) that the LIF has a comparable performance to the MST/GNM when classifying a single pattern, but its ability to learn drops for multipattern classification. This is understandable because the refractory period effectively shortens the time the LIF is able to react to incoming signals, thus making it hard for the neuron to activate several times during a limited period. Yet as our simulations show (see Figure 8), the performance of the LIF neuron is worse even for a refractory period of length 0. Once the refractory period is removed, the only remaining difference between the LIF and the GNM is the hard reset. Note that this hard reset effectively destroys the temporal autocorrelation of the membrane potential. Hence, the observation that the LIF neuron performs worse than the GNM supports further our conclusion that a balanced memory of the membrane potential is required for good performance on multilabel classification of spatiotemporal signals. This now raises the question whether of biological neurons, which clearly do have a refractory period, are suboptimal components. We do not believe that this conclusion can be made because brains operate in a different context from the restricted problem set that we considered here. Moreover, the refractory period in real neurons may well be a reflection of some resource limitations or physical constraints that we have not considered here, thus making a comparison invalid.

Throughout this letter we have evaluated the GNM assuming an aggregate label delayed feedback learning rule during training. This training method is mainly motivated by its biological plausibility. In applications of the GNM/MST in the context of AI, biological plausibility is not a relevant criterion. We found, perhaps rather unsurprising, that dropping the requirement of aggregate label delayed feedback in favor of more immediate and information-rich feedback led to increased model performance (see section 3.3).

Once we allow such direct error feedback, we can extend it to backpropagation-based training methods in networks of GNMs. We demonstrate the feasibility of this approach by solving the multilabel classification task using a layered network of GNMs with 10 hidden neurons (see Figures 3 and 7). Deep learning with SNNs could lead to substantive benefits in terms of smaller models and more efficient hardware if only it is possible to transfer established deep learning techniques to spiking architectures. We leave it to future research to establish whether the GNM or similar spiking architectures could indeed be a credible alternative for existing deep architectures.

## 5 Conclusion and Outlook

Gütig's multispike tempotron is a powerful single neuron model that can classify spatiotemporal patterns into multiple classes. The model is also complicated to implement. Here, we showed that the much simpler GNM neuronal model can achieve the same performance as the MST. Our results indicate that the important feature of neuronal models is the temporal autocorrelation of the membrane potential; that is, how quickly the neuron forgets about past inputs. We found that for intermediate values, the model performance is maximized. It remains an open question for future research whether this conclusion is specific to the particular task we considered, or whether the optimal memory emerges as *the* crucial parameter in all applications. If the power of SNNs is to be leveraged in practical AI applications, then it will be necessary to understand the minimal spiking neuron that is sufficient for a particular task so as to be able to build resource-efficient systems.