Abstract

Researchers building spiking neural networks face the challenge of improving the biological plausibility of their model networks while maintaining the ability to quantitatively characterize network behavior. In this work, we extend the theory behind the neural engineering framework (NEF), a method of building spiking dynamical networks, to permit the use of a broad class of synapse models while maintaining prescribed dynamics up to a given order. This theory improves our understanding of how low-level synaptic properties alter the accuracy of high-level computations in spiking dynamical networks. For completeness, we provide characterizations for both continuous-time (i.e., analog) and discrete-time (i.e., digital) simulations. We demonstrate the utility of these extensions by mapping an optimal delay line onto various spiking dynamical networks using higher-order models of the synapse. We show that these networks nonlinearly encode rolling windows of input history, using a scale invariant representation, with accuracy depending on the frequency content of the input signal. Finally, we reveal that these methods provide a novel explanation of time cell responses during a delay task, which have been observed throughout hippocampus, striatum, and cortex.

1  Introduction

One of the central challenges in computational neuroscience is understanding how dynamic stimuli can be processed by neural mechanisms to drive behavior. Recurrent connections, cellular responses, and synaptic responses are ubiquitous sources of dynamics throughout the mammalian brain that must work in concert to support dynamic information processing (Kandel, Schwartz, & Jessell, 2000). How these low-level mechanisms interact in order to encode information about the history of a stimulus, across time, is the subject of considerable study. One approach to better understanding these mechanisms is to construct models that capture central features of neural dynamics while implementing higher-level information processing.

The neural engineering framework (NEF; Eliasmith & Anderson, 1999, 2003) proposes a method to model such dynamical systems in networks of spiking neurons (see Abbott, DePasquale, & Memmesheimer, 2016; Denève & Machens, 2016, for reviews of other methods). The NEF has been used to construct a wide variety of neural models, including a 2.3 million neuron functioning model of the human brain, capable of performing perceptual, motor, and cognitive tasks (Eliasmith et al., 2012). This model incorporates many kinds of observed neural dynamics, including oscillations, sustained activity, and point attractor dynamics. The flexibility of the NEF has led to its deployment on mixed-analog-digital neuromorphic chips (Choudhary et al., 2012; Corradi, Eliasmith, & Indiveri, 2014; Voelker, Benjamin, Stewart, Boahen, & Eliasmith, 2017; Voelker & Eliasmith, 2017) and digital architectures (Bekolay et al., 2013; Wang, Hamilton, Tapson, & van Schaik, 2014; Mundy, Knight, Stewart, & Furber, 2015; Berzish, Eliasmith, & Tripp, 2016). Consequently, the NEF provides a practical method for programming neuromorphics, thus helping the field realize its promise of a low-energy computing platform that emulates core principles of the nervous system (Boahen, 2017).

However, the NEF typically assumes an exponential model of the postsynaptic current (PSC) evoked by an action potential, which has a biologically implausible, instantaneous rise time. This model is also known as a first-order low-pass filter. In contrast, the synapse models used in mixed-analog-digital neuromorphic chips are nonideal, featuring higher-order dynamics due to parasitic capacitances (Voelker, Benjamin et al., 2017). Similarly, the synapse models commonly used in biological models incorporate distinct rise and fall times due to separate timescales of transmitter binding and unbinding, as well as axonal transmission delays due to the finite-velocity propagation of action potentials (Roth & van Rossum, 2009). To widen the scope of the NEF, we characterize the network-level effects of these higher-order synapse models and harness them to implement certain classes of dynamical systems with improved accuracy.

A particularly important dynamical system that has not been implemented using the NEF is the pure continuous-time delay line. This system must represent a rolling window of input history. We provide a novel derivation of an optimal low-dimensional linear approximation to a continuous-time delay and prove that the resulting delay network nonlinearly encodes its input across the delay interval. This network uses a scale-invariant representation, with a level of accuracy that depends on the input frequency, chosen dimensionality (i.e., the order of the approximation), and particular synapse model. Low-dimensional representations (e.g., ) of low-frequency signals (e.g., Hz) are pervasive in biological systems (Cunningham & Byron, 2014; Waernberg & Kumar, 2017; Pulvermüller, Birbaumer, Lutzenberger, & Mohr, 1997; Singer, 1999). To our knowledge, this work is the first to demonstrate that such a temporal code may be accurately implemented using a spiking dynamical network.

Reservoir computing approaches, such as liquid state machines (Maass, Natschläger, & Markram, 2002) and echo state networks (Jaeger, 2001), may be used to approximate a delay line. However, since these networks use randomly chosen feedback weights, it is likely that they do not efficiently produce the dynamics of a pure delay. Rather, these networks represent a random variety of nonlinear memory traces (Lukoševičius, 2012). Discrete approaches to short-term memory, such as those taken by White, Lee, and Sompolinsky (2004) and Ganguli, Huh, and Sompolinsky (2008), while optimal in an information-theoretic sense, rely fundamentally on single time-step delays between rate-based neurons. In contrast, the method that we propose here works independently of the simulation time step, and is optimal assuming the population of spiking neurons—coupled with some model of the synapse—accurately represents a low-dimensional, low-frequency vector space. Furthermore, our framework is extended to account for arbitrary linear synapses, which improves our understanding of the relationship between synapse models and network-level computation. A detailed comparison of our method to reservoir computing remains the subject of future work.

A distinguishing feature of this work is that we begin the problem with a mathematical description of the ideal dynamics for the delay line and then proceed by mapping this description onto a spiking neural network (using the NEF and its extensions). This is in contrast to methods that use online learning or backpropagation through time with rate-based neurons (De Vries & Principe, 1992; Sussillo & Abbott, 2009) or spiking neurons (Nicola & Clopath, 2016; Huh & Sejnowski, 2017; Gilra & Gerstner, 2017; Alemi, Machens, Denève, & Slotine, 2017). That is, we do not require any sophisticated training regimes involving online learning or backpropagation; our delay network is trained offline using convex optimization (i.e., regularized least squares), which yields a more complete understanding of its employed representation and dynamics.

The remainder of this article is structured as follows. In section 2, we introduce the NEF, placing an emphasis on the mapping of continuous-time linear systems onto the canonical low-pass model of the synapse. In section 3.1, we use the NEF to derive a spiking dynamical network that delays its input signal by some fixed length of time and then analyze this network in section 3.2 to characterize how it nonlinearly encodes the stimulus across a rolling window. In section 4, we extend the NEF to characterize the effects of higher-order synapse models and account for both continuous-time (i.e., for analog hardware) and discrete-time (i.e., for digital hardware) simulations. In section 5, we exploit the methods from section 4 to demonstrate their utility in computing delays and provide insights into possible neural mechanisms. Specifically, we harness a small delay in the synapse model to improve the accuracy of a larger network-level delay. And we illustrate the relevance of our delay network to biological modeling through qualitative and quantitative comparisons to the responses of time cells (Eichenbaum, 2014), suggesting a new characterization of how temporal representations may arise in the brain.

2  Neural Engineering Framework

The neural engineering framework (NEF; Eliasmith & Anderson, 1999, 2003) consists of three mathematical principles used to describe neural computation. The NEF is most commonly applied to building dynamic (i.e., recurrent) spiking neural networks, but also applies to nonspiking and feedforward networks. Its primary strength lies in providing a well-understood and efficient means of engineering spiking neural models, and programming neuromorphic hardware, to perform mathematically described computations (Eliasmith, 2013; Boahen, 2017). In this section, we provide an overview of these methods applied to training both feedforward and recurrent connection weights in order to implement linear dynamical systems, although these methods extend to nonlinear dynamical systems as well (Voelker, Benjamin et al., 2017; Voelker & Eliasmith, 2017). We present this framework in a manner that is consistent with the Nengo 2.4.0 simulator (Bekolay et al., 2013), which implements the NEF among other approaches.

2.1  Principle 1: Representation

Let denote a -dimensional time-varying signal that is to be represented by a population of spiking neurons. To describe this representation, we define a nonlinear encoding and a linear decoding that together determine how neural activity relates to the represented vector.

First, we choose encoders , gains , and biases , , as parameters for the encoding, which map to neural activities. These parameters are fit from neuroanatomical data (e.g., tuning curves, preferred directions, firing rates, sparsity; see Friedl, Voelker, Peer, & Eliasmith, 2016, for a recent example) or randomly sampled from distributions constrained by the domain of and the dynamic range of the neuron models. In either case, the encoding is defined by
formula
2.1
where is the neural activity generated by the th neuron encoding the vector at time , denotes a dot product, and is the nonlinear dynamical system for a single neuron (e.g., a leaky integrate-and-fire, neuron, a conductance-based neuron). Then , where is the Dirac delta and is the sequence of spike times generated by equation 2.1.

Having defined an encoding, we introduce a postsynaptic filter , which acts as the synapse model by capturing the dynamics of a receiving neuron's synapse. In particular, this filter models the postsynaptic current (PSC) triggered by action potentials arriving at the synaptic cleft. For now, we fix to be an exponentially decaying PSC with time constant , which is equivalent (in the Laplace domain) to the canonical first-order low-pass filter (also known as a leaky integrator). This is the conventional choice of synapse in the NEF, since it strikes a convenient balance between mathematical simplicity and biological plausibility (Eliasmith & Anderson, 2003). In section 4, we return to this point by considering more general synapse models that are capable of capturing a much broader variety of PSCs.

We can now characterize the decoding of the neural response, which determines the information extracted from the neural activities encoding . Let be the decoding matrix that decodes from the population's activities at time . This linear decoding is described by
formula
2.2
where is the convolution operator that is used to apply the synaptic filter. Equation 2.2 takes a linear combination of the filtered activities in order to recover a filtered version of the encoded signal.1 To complete the characterization of neural representation, we solve for the optimal linear decoders . This optimization is identical for principles 1 and 2, as discussed below.

2.2  Principle 2: Transformation

The second principle of the NEF addresses the issue of computing transformations of the represented signal. The encoding remains defined by equation 2.1. However, we now decode some desired function of , ,2 by applying an alternate matrix of decoders to the same activities:
formula
2.3
For both principles 1 and 2, we optimize for over the domain of the signal, , which is typically the unit -ball or the unit -cube . To determine these decoders, we first let be the limiting average firing rate of the th neuron under the constant input :
formula
2.4
For nonadaptive neuron models, equation 2.4 reduces to encoding using a rate model. For adaptive neuron models, other definitions for may be considered, but we limit our discussion here to the (nonadaptive) spiking LIF model. To account for the variance introduced by neural spiking and other sources of uncertainty, we introduce a white noise term . The optimality criterion for is therefore
formula
2.5
Note that this optimization depends on only for , as opposed to depending on the signal . In other words, the optimization is determined strictly by the distribution of the signal, and not according to its particular dynamics. Furthermore, this is a convex optimization problem, which may be solved by uniformly sampling (Voelker, Gosmann, & Stewart, 2017) and applying a standard regularized least-squares solver to the sampled data (Bekolay et al., 2013). Monte Carlo sampling introduces error into the integral from equation 2.5, where is the number of samples, but this can be improved to —effectively squaring —by the use of quasi–Monte Carlo methods (Fang & Wang, 1994; Knight, Voelker, Mundy, Eliasmith, & Furber, 2016). Nengo also supports alternative decoder solvers that optimize variations of equation 2.5 (e.g., Friedl et al., 2016; Kauderer-Abrams et al., 2017), but we do not use them here.

The accuracy of this approach relies on being a suitable proxy for whenever . This zeroth-order approximation clearly holds in the steady state for constant and turns out to be ideal in practice for low-frequency (Eliasmith & Anderson, 2003, appendix F.1), and likewise for that filter out high frequencies (i.e., when the synaptic time-constant is large).

Equations 2.1 and 2.3 may then be used to derive a connection weight matrix between layers to implicitly compute the desired function within the latent vector space . Specifically, the weight matrix , which maps activities from the th presynaptic neuron to the th postsynaptic neuron (disregarding the gain, bias, and synapse), is given by
formula
2.6
Consequently, the matrices and are a low-rank factorization of . In other words, the process of decoding (see equation 2.3) and then encoding (see equation 2.1) is equivalent to taking the dot product of the full-rank weight matrix with the neural activities.

This factorization has important consequences for the computational efficiency of neural simulations. The crucial difference between the factorized and nonfactorized forms is that it takes operations per simulation time step to implement this dot product in the factored form of equation 2.6, as opposed to operations for a full-rank weight matrix. Since is typically held constant, this yields a factor improvement in simulation time. Similarly, this factorization yields an reduction in memory, which significantly improves the scaling of neuromorphics (Mundy et al., 2015). In essence, this factorization provides a way to describe the network's latent state vector . This, in turn, permits us to perform useful computations by transforming the state vector with error in the presence of spike-noise (Eliasmith & Anderson, 2003).

2.3  Principle 3: Dynamics

Principle 3 is a method of harnessing the dynamics of the synapse model for network-level information processing. We begin by focusing our discussion of NEF dynamics on the neural implementation of continuous, linear time-invariant (LTI) systems,
formula
2.7
where the time-varying signal represents the system state, its time derivative, the output, the input, and the time-invariant matrices fully describe the system (Brogan, 1991). This form of an LTI system is commonly referred to as the state-space model, but there are many other equivalent forms, which we will refer to later.
For LTI systems, the dynamical primitive—that is, the source of the dynamics—is the integrator (see Figure 1). However, the dynamical primitive at our disposal is the leaky integrator, given by the canonical first-order low-pass filter modeling the synapse
formula
2.8
where denotes the inverse Laplace transform.3 To be more precise, our approach is to represent the state vector in a population of spiking neurons (principle 1; see equation 2.1) such that this vector is obtained by filtering some linearly decoded spike trains with a leaky integrator (principle 2; see equation 2.3). Thus, the goal of principle 3 is to determine the transformations required to implement equation 2.7, given that is obtained by some convolution with a leaky integrator rather than the perfect integrator depicted in Figure 1.
Figure 1:

Block diagram for an LTI system. The integrator is driven by the signal .

Figure 1:

Block diagram for an LTI system. The integrator is driven by the signal .

Principle 3 states that in order to implement equation 2.7 in a population of neurons that represents , we must compensate for the effect of “replacing” the integrator with a leaky integrator (compare Figures 1 and 2), that is, by driving the synapse with instead of only . This compensation is achieved as follows: implement the recurrent transformation and the input transformation , but use the same output transformation , and the same pass-through transformation (Eliasmith & Anderson, 2003). Specifically, this may be implemented in a spiking dynamical network by representing via principle 1 and then using principle 2 to decode the needed transformations. The resulting model is summarized in Figure 2.

Figure 2:

Block diagram for an LTI system, equivalent to Figure 1, with the integrator replaced by a first-order low-pass filter. The low-pass is driven by the signal to ensure that it implements the same system as in Figure 1.

Figure 2:

Block diagram for an LTI system, equivalent to Figure 1, with the integrator replaced by a first-order low-pass filter. The low-pass is driven by the signal to ensure that it implements the same system as in Figure 1.

The correctness of this “mapping” procedure relies on three assumptions: (1) the synapse model is equation 2.8, (2) the network is simulated in continuous time (or the discrete time step is sufficiently small), and (3) the necessary representations and transformations are sufficiently accurate such that the approximation error from equation 2.3 is negligible. In other words, assuming is sufficiently large, the architecture of Figure 2 is equivalent to Figure 1, but using the leaky integrator instead of an integrator as the dynamical primitive (Eliasmith & Anderson, 2003). Consequently, both systems compute the exact same signals and . In section 4.1, we provide a novel proof of this equivalence. In sections 4.2 to 4.4, we extend principle 3 to remove the first and second assumptions.

Principle 3 is useful for accurately implementing a wide class of dynamical systems (e.g., integrators, oscillators, attractor networks) to solve specific problems that frequently arise in neural modeling (e.g., Eliasmith & Anderson, 2000; Singh & Eliasmith, 2004, 2006; Eliasmith, 2005). Furthermore, the class of state-space models is isomorphic to the class of all finite-dimensional causal linear filters or, equivalently, all rational (finite-order) proper transfer functions, which is a large and useful class of dynamical systems employed widely in control applications (Brogan, 1991). Given the ability of principle 2 to compute nonlinear functions (see equation 2.3), principle 3 also naturally generalizes to nonlinear dynamical systems, but this is outside the scope of this work; we refer readers to Voelker, Benjamin et al. (2017) and Voelker and Eliasmith (2017) for recent nonlinear extensions.

3  Spiking Delay Networks

A fundamental dynamical system that has not yet been discussed within the NEF literature is the continuous-time delay line of seconds,4 expressed as
formula
3.1
where denotes a Dirac delta function shifted backward in time by . This system takes a time-varying scalar signal, , and outputs a purely delayed version, . The task of computing this function both accurately and efficiently in a biologically plausible, spiking, dynamical network is a significant theoretical challenge that, to our knowledge, has previously remained unsolved.

The continuous-time delay is worthy of detailed consideration for several reasons. First, it is nontrivial to implement using continuous-time spiking dynamical primitives. Specifically, equation 3.1 requires that we maintain a rolling window of length (i.e., the history of , going seconds back in time). Thus, computing a delay of seconds is just as hard as computing every delay of length for all . Since any finite interval of contains an uncountably infinite number of points, an exact solution for arbitrary requires that we maintain an uncountably infinite amount of information in memory. Second, the delay provides us with a window of input history from which to compute arbitrary nonlinear functions across time. For instance, the spectrogram of a signal may be computed by a nonlinear combination of delays, as may any finite impulse response (FIR) filter. Third, delays introduce a rich set of interesting dynamics into large-scale neural models, including oscillatory bumps, traveling waves, lurching waves, standing waves, aperiodic regimes, and regimes of multistability (Roxin, Brunel, & Hansel, 2005). Fourth, a delay line can be coupled with a single nonlinearity to construct a network displaying many of the same benefits as reservoir computing (Appeltant et al., 2011). Finally, examining the specific case of the continuous-time delay introduces several methods and concepts we employ in generally extending the NEF (see section 4).

3.1  Implementation

As it is impossible in practice (i.e., given finite-order continuous-time resources) to implement an arbitrary delay, we will instead approximate as a low-frequency signal or, equivalently, approximate equation 3.1 as a low-dimensional system expanded about the zeroth frequency in the Laplace domain. We begin by transforming equation 3.1 into the Laplace domain, and then using the fact that to obtain
formula
3.2
where is known as the transfer function of the system, defined as the ratio of the Laplace transform of the output to the Laplace transform of the input. Equation 3.2 should be understood as an equivalent way of expressing equation 3.1 in the Laplace domain, where the variable denotes a complex frequency. Thus far, we have only described the transfer function that we would like the network to implement.
The state-space model discussed in section 2.3 (see equation 2.7) is related to its transfer function by
formula
3.3
Conversely, a transfer function can be converted into a state-space model satisfying equation 3.3 if and only if it can be written as a proper ratio of finite polynomials in (Brogan, 1991). The ratio is proper when the degree of the numerator does not exceed that of the denominator. In such a case, the output will not depend on future input, so the system is causal. The degree of the denominator corresponds to the dimensionality of the state vector and therefore must be finite. These two conditions align with physically realistic constraints where time may only progress forward and neural resources are limited.
However, the pure delay (see equation 3.2) has infinite order when expressed as a ratio of polynomials in , and so the system is irrational, or infinite dimensional. Consequently, no finite state-space model will exist for , which formalizes our previous intuition that an exact solution is impossible for finite, continuous-time systems. To overcome this, we must approximate the irrational transfer function as a proper ratio of finite-order polynomials. We do so using its Padé approximants—the coefficients of a Taylor series extended to the ratio of two polynomials—expanded about (Padé, 1892; Vajta, 2000):
formula
3.4
This provides the transfer function of order in the numerator and order in the denominator that optimally approximates equation 3.2 for low-frequency inputs (i.e., up to order ).

After choosing , we may numerically find a state-space model that satisfies equation 3.3 using standard methods5 and then map this system onto the synapse using principle 3. However, naively applying this conversion leads to numerical issues in the representation (i.e., dimensions that grow exponentially in magnitude) due in part to the factorials in equation 3.4.

To overcome this problem, we derive an equivalent yet normalized state-space model that we have not encountered elsewhere. We do so for the case of , since this provides the best approximation to the step response (derived in section A.1),
formula
3.5
where and , for . This model is now equivalent to equation 3.4, but without any factorials and in the form of equation 2.7.6 The choice of corresponds to the dimensionality of the latent state vector that is to be represented by principle 1 and transformed by principle 2. Principle 3 may then be used to map equation 3.5 onto a spiking dynamical network to accurately implement an optimal low-frequency approximation of the delay.

To demonstrate, we implement a 1 s delay of Hz band-limited white noise using 1000 recurrently connected spiking LIF neurons representing a six-dimensional vector space (see Figure 3). The connections between neurons are determined by applying principle 3 (see section 2.3) to the state-space model derived above (see equation 3.5, ) via the Padé approximants of the delay. The normalized root-mean-squared error (NRMSE) of the output signal is 0.048 (normalized so that 1.0 would correspond to random chance). This is achieved without appealing to the simulation time step ( ms); in fact, as shown in section 5.2, the network accuracy improves as approaches zero due to the continuous-time assumption mentioned in section 2.3 (and resolved in section 4.2).

Figure 3:

Delay of 1 s implemented by applying standard principle 3 to equation 3.5 using , ms, 1000 spiking LIF neurons, and a low-pass synapse with s. The input signal is white noise with a cutoff frequency of 1 Hz. The plotted spikes are filtered with the same s and encoded with respect to 1000 encoders sampled uniformly from the surface of the hypersphere (sorted by time to peak activation).

Figure 3:

Delay of 1 s implemented by applying standard principle 3 to equation 3.5 using , ms, 1000 spiking LIF neurons, and a low-pass synapse with s. The input signal is white noise with a cutoff frequency of 1 Hz. The plotted spikes are filtered with the same s and encoded with respect to 1000 encoders sampled uniformly from the surface of the hypersphere (sorted by time to peak activation).

3.2  Temporal Coding

The -dimensional state vector of the delay network represents a rolling window of length . That is, a single delay network with some fixed may be used to accurately decode any delay of length (). Different decodings require different linear output transformations () for each , with the following coefficients (derived in section A.2):
formula
3.6
The underlying dynamical state remains the same.

In Figure 4, we take different linear transformations of the same state vector by evaluating equation 3.6 at various delays between 0 and to decode the rolling window of input from the state of the system.7 This demonstrates that the delay network compresses the input's history (lasting seconds) into a low-dimensional state.

Figure 4:

Decoding a rolling window of length . Each line corresponds to a different delay, ranging from 0 to , decoded from a single delay network (). (Left) Error of each delay, as the input frequency is increased relative to . Shorter delays are decoded more accurately than longer delays at higher frequencies. A triangle marks . (Right) Example simulation decoding a rolling window of white noise with a cutoff frequency of Hz.

Figure 4:

Decoding a rolling window of length . Each line corresponds to a different delay, ranging from 0 to , decoded from a single delay network (). (Left) Error of each delay, as the input frequency is increased relative to . Shorter delays are decoded more accurately than longer delays at higher frequencies. A triangle marks . (Right) Example simulation decoding a rolling window of white noise with a cutoff frequency of Hz.

In Figure 5, we sweep equation 3.6 across to visualize the temporal “basis functions” of the delay network. This provides a way to understand the relationship between the chosen state-space representation (i.e., the -dimensional ) and the underlying window representation (i.e., the infinite-dimensional ). In particular, each basis function corresponds to the continuous window of history represented by a single dimension of the delay network. The instantaneous value of each dimension acts as a coefficient on its basis function, contributing to the representation of the window at that time. Overall, the entire state vector determines a linear combination of these basis functions to represent the window. This is analogous to the static function representation explored previously within the context of principles 1 and 2 (Eliasmith & Anderson, 2003).

Figure 5:

Temporal basis functions of the delay network (). Each line corresponds to the basis function of a single dimension () ranging from 0 (darkest) to (lightest). The th basis function is a polynomial over with degree (see equation 3.6). The state vector of the delay network takes a linear combination of these basis functions in order to represent a rolling window of length .

Figure 5:

Temporal basis functions of the delay network (). Each line corresponds to the basis function of a single dimension () ranging from 0 (darkest) to (lightest). The th basis function is a polynomial over with degree (see equation 3.6). The state vector of the delay network takes a linear combination of these basis functions in order to represent a rolling window of length .

The encoder of each neuron can also be understood directly in these terms as taking a linear combination of the basis functions (via equation 2.1). Each neuron nonlinearly encodes a projection of the rolling window onto some “preferred window” determined by its own encoder. Since the state vector is encoded by heterogeneous neural nonlinearities, the population's spiking activity supports the decoding of nonlinear functions across the entire window (i.e., functions that we can compute using principles 1 and 2). Therefore, we may conceptualize the delay network as a temporal coding of the input stimulus, which constructs a low-dimensional state, representing an entire window of history, to encode the temporal structure of the stimulus into a nonlinear high-dimensional space of neural activities.

To more thoroughly characterize the delay dynamics, we analyze the behavior of the delay network as the dimensionality is increased (see Figure 6). Specifically, we perform a standard principal component analysis (PCA) on the state vector for the impulse response and vary the order from to . This allows us to visualize a subset of the state vector trajectories, via projection onto their first three principal components (see Figure 6, top). The length of this trajectory over time distinguishes different values of (see Figure 6, bottom). This length curve is approximately logarithmic when , convex when , and sigmoidal when . To generate this figure, we use a delay of s, but in fact, this analysis is scale invariant with time. This means that other delays will simply stretch or compress the impulse response linearly in time (not shown).

Figure 6:

Impulse response of the delay network with various orders () of Padé approximants. (Top) The state vector projected onto its first three principal components. (Bottom) The length of the curve up to time , computed using the integral (normalized to 1 at ). This corresponds to the distance traveled by the state vector over time. The dashed line marks the last inflection point, indicating when begins to slow down.

Figure 6:

Impulse response of the delay network with various orders () of Padé approximants. (Top) The state vector projected onto its first three principal components. (Bottom) The length of the curve up to time , computed using the integral (normalized to 1 at ). This corresponds to the distance traveled by the state vector over time. The dashed line marks the last inflection point, indicating when begins to slow down.

The delay network is scale invariant with the delay length over input frequency, that is, the accuracy for a chosen order is a function of (see the units in Figure 4, for instance). More specifically, for a fixed approximation error, the delay length scales as , where is the input frequency. Then the accuracy of the mapped delay is a function of the relative magnitude of the delay length to , whose exact shape depends on the considered synapse model. To determine these functions for a wide class of synapses, we proceed by extending the NEF.

4  Extending the Neural Engineering Framework

With the example of building a continuous-time delay in hand, we now proceed to extend principle 3 to arbitrary linear synapses. We focus on building a comprehensive theory for linear systems, but many of these same techniques also carry over to the case of nonlinear dynamical systems with heterogeneous synapses (Voelker, Benjamin et al., 2017; Voelker & Eliasmith, 2017).

Let be the transfer function for the linear dynamics that we wish to implement (see equations 2.7 and 3.3), and let be the transfer function for an arbitrary linear synapse model (). As stated in section 2.3, introducing the synapse model means replacing the integrator () with . This is equivalent to replacing with . Notably, substituting for the integrator results in the transfer function , which no longer implements the original, desired dynamics . However, we would like to ensure that the new dynamics match the originally specified . The key insight is that we can determine a new function, , such that . That is, we can solve for a function that provides the original dynamics when implemented using the transfer function as the dynamical primitive. This is formalized by the following definition:

Definition 1.
A function maps onto if and only if it satisfies
formula
4.1

This definition compactly expresses the notion of a change of dynamical primitive in that is mapped from the canonical primitive, , onto some new primitive, . Trivially, maps itself onto . Nontrivial examples are given throughout sections 4.1 to 4.4.

Once we identify an that maps onto , any state-space model that satisfies
formula
4.2
will implement the desired dynamics when using as the dynamical primitive, by equations 3.3 and 4.1 (see Figure 7).
Figure 7:

Block diagram for an LTI system, equivalent to Figure 1, with the integrator replaced by a more general linear filter . The state-space model is obtained from some transfer function that maps onto , as defined in the text. This generalizes Figure 2 to arbitrary linear synapse models.

Figure 7:

Block diagram for an LTI system, equivalent to Figure 1, with the integrator replaced by a more general linear filter . The state-space model is obtained from some transfer function that maps onto , as defined in the text. This generalizes Figure 2 to arbitrary linear synapse models.

Therefore, supposing satisfies definition 1 and that it is convertible to a state-space model (see equation 2.7), then Figure 7 is just another form of Figure 1, but with the integrator replaced by the synapse. Note that this construction, on its own, does not specify how to find a satisfying or whether such a function exists, or whether it can be converted to a state-space model. We provide several examples leading to such a specification in section 4.4.

Before proceeding, we remark that the above theory directly carries over from the continuous-time domain to the discrete-time domain. The discrete-time formulation of an LTI system is similar to equation 2.7, but increments time in steps of length :
formula
4.3
where the discrete state-space model fully defines the system. The discrete-time equivalent to the Laplace transform is the -transform, named for its use of the variable to denote the complex frequency domain. In this domain, plays the role of by performing a discrete shift forward, one step in time (i.e., a delay of one time step) instead of integration. A well-known result is that the transfer function of this discrete LTI system, defined as the ratio of the -transform of the output to the -transform of the input, is equal to . Consequently, all of the previous discussion carries over to discrete LTI systems. In particular, for a discrete synapse expressed using the -transform, , we have the analogous definition:
Definition 2.
A function maps onto if and only if it satisfies
formula
4.4
Given some that maps onto , any state-space model that satisfies
formula
4.5
will implement the desired dynamics when using as the dynamical primitive. Hence, the task of determining is identical for both continuous- and discrete-time domains; it is only and that differ.

4.1  Continuous Low-Pass Synapse

The first example we consider demonstrates that our new theory recovers the standard form of principle 3 from the NEF (see section 2.3). For the case of a continuous-time first-order low-pass filter (see equation 2.8), , let
formula
Then
formula
which satisfies definition 1. Therefore, by equation 4.2,
formula
4.6
This completes our novel proof of principle 3 from section 2.3.

4.2  Discrete Low-Pass Synapse

When simulating any NEF network on a digital computer, we necessarily use time steps of some length to advance the state of the network, updating at discrete moments in time (Bekolay et al., 2013). For instance, Nengo currently uses a default of ms and implements a zero-order hold (ZOH) discretization of the synaptic filter. Implementing ZOH means that all continuous-time signals are held constant within each time step. This discretization of equation 2.8 gives
formula
If our desired transfer function is expressed in continuous time, , we should also discretize it to , with the same time step, and again using ZOH discretization for consistency. Let
formula
Then,
formula
which satisfies definition 2. Therefore, by equation 4.5,
formula
4.7
provides an exact implementation of the desired system for digital architectures, regardless of the simulation time step (assuming ZOH).

4.3  Delayed Continuous Low-Pass Synapse

Next, we consider a continuous-time first-order low-pass filter with a time-delay of :
formula
4.8
This same model has been proposed by Roth and van Rossum (2009, equation 6.2) as a more realistic alternative to equation 2.8, which includes an axonal transmission delay of length (on the order of ) to account for the finite-velocity propagation of action potentials. By commutativity of convolution, modeling the delay in the synapse (as in equation 4.8) is equivalent to modeling the delay in spike propagation. Equation 4.8 may also be used to account for feedback delays within some broader setting (e.g., when the feedback term is computed via some delayed system).
Letting , and denote the principal branch of the Lambert- function (Corless, Gonnet, Hare, Jeffrey, & Knuth, 1996), we invert as follows:
formula
where the second-last line assumes that and , where , in order for to be within the principal branch (Corless et al., 1996, equation 4.4).8 Therefore,
formula
4.9
As a demonstration of how we might use this mapping, suppose the desired transfer function for our system is a time delay of seconds, (see equation 3.2). In this setting, we are attempting to “amplify” a delay of seconds in the synapse into a system delay of seconds at the network level. Letting and , and then substituting into equation 4.9 provides the required function:
formula
This may be numerically converted into a state-space model, of arbitrary dimensionality , via the Padé approximants of the following Maclaurin series:
formula
4.10
To our knowledge, there is no closed-form expression for the Padé approximants of equation 4.10, but there are methods to compute them accurately and in time (Sidi, 2003). Given these approximants, we may follow the same procedure from section 3.1 to obtain an LTI system in the form of equation 2.7. In section 5.2, we use this approach to improve the accuracy of the delay network demonstrated in section 3.1. We remark that each of , , and is a dimensionless (i.e., unitless) constant that can be used to relate measurable properties of a biological system that may be governed by this description to the necessary network-level computations.

4.4  General Linear Synapse

Finally, we consider the general class of all linear synapse models of the form
formula
4.11
for some polynomial coefficients of arbitrary degree . To the best of our knowledge, this class includes the majority of linear synapse models used in the literature. For instance, this includes the first-order low-pass synapse that is standard in the NEF. It also includes the second-order alpha synapse, (Rall, 1967)—the convolution of two exponentials with identical time constants—which is commonly used in biological models (Koch & Segev, 1989; Destexhe, Mainen, & Sejnowski, 1994b; Mainen & Sejnowski, 1995; Destexhe, Mainen, & Sejnowski, 1998; Roth & van Rossum, 2009). The alpha synapse essentially filters the spike trains twice, to produce PSCs with finite (noninstantaneous) rise times. In addition, equation 4.11 includes a generalization of the alpha synapse, the double exponential synapse, (Wilson & Bower, 1989)—the convolution of two exponentials with time constants and —that has different rise and fall times to account for the separate timescales of rapid transmitter binding followed by slow unbinding (Destexhe et al., 1994b; Häusser & Roth, 1997; Roth & van Rossum, 2009). The double exponential is also a suitable model to account for parasitic capacitances in neuromorphic hardware (Voelker, Benjamin et al., 2017). Furthermore, equation 4.11 includes a higher-order generalization of the alpha synapse, the gamma kernel, (De Vries & Principe, 1992, equation 19). Finally, this class contains more exotic models, such as the -bandpass synapse model, , where is the peak frequency in radians per second and is inversely proportional to the bandwidth. Such synapses have been used to model bandpass filtering from mechanoreceptors in the fingertip (Friedl et al., 2016) or from rods in the retina (Armstrong-Gold & Rieke, 2003).

As well, in section A.5, we demonstrate how to use equation 4.11 to model synapses with pure delays and to prove a complete connection to ZOH discretization. Specifically, by the use of Padé approximants, we may rewrite any transfer function with a Taylor series expansion (even those that are irrational or improper) in the form of equation 4.11, albeit with some radius of convergence. This permits us to model, for instance, synapses with pulse-extended responses (Voelker, Benjamin et al., 2017).9 Similarly, the delayed low-pass, equation 4.8, may be expressed in the form of equation 4.11. Finally, any linear combinations of the aforementioned synapse models will also be a member of this class. Nonlinear synapse models, such as conductance-based synapses (e.g., Destexhe, Mainen, & Sejnowski, 1994a, equation 6), are a current subject of study in the NEF (Stöckel, Voelker, & Eliasmith, 2017).

To map onto equation 4.11, we begin by defining our solution to in the form of its state-space model (see equation 4.2),
formula
4.12
which we claim satisfies the required identity, (equation 4.1). To prove this claim, we first rearrange the following expression:
formula
We then complete the proof by substituting this result and the state-space model into our expression for the mapped transfer function from equation 4.2:
formula
An alternative derivation may also be found in Voelker and Eliasmith (2017, section 2.2). An interesting insight gained with this solution is that it is not, in general, the case that is time invariant, since it depends on . As a result, solutions will often not be a typical state-space model in the form of equation 2.7.

Nevertheless, such solutions can still be implemented, as is, in practice. Since is the th-order differential operator, this form of states that we must supply the th-order input derivatives for all . When , as in section 4.1, this is trivial (no derivatives are needed). For , let us first define . Then equation 4.12 shows that the ideal state-space model must implement the input transformation as a linear combination of input derivatives, . If the derivatives of the input are available to the model, then the we described may be used to precisely implement the desired .

However, if the required derivatives are not included in the neural representation, then it is natural to use a ZOH method by assuming , for all :
formula
4.13
with , , and as before (see equation 4.12). This is now an equivalent model to equation 4.12 assuming ZOH, and in the form of the standard state-space model (see equation 2.2). In section A.4, we characterize the dynamics of this new system in terms of and (independent of the chosen state-space model) for general inputs. We find that equation 4.13 adds new dimensions, for every dimension in , to the state underlying the resulting dynamical system. In section A.5, we show that our results yield a novel derivation of ZOH discretization for LTI systems.

To our knowledge, this specific state-space architecture has not been explored in theory or in practice. Given that equation 4.12 requires the input derivatives to accurately compute low-dimensional network-level dynamics, this strongly suggests that it is important to represent and compute derivatives in neural systems. Methods of computing derivatives have been explored by Tripp and Eliasmith (2010) within the NEF. As well, it has long been suggested that adaptive neural mechanisms (e.g., synaptic depression or spike rate adaptation) may also play a fundamental role in computing such derivatives (Abbott & Regehr, 2004; Lundstrom, Higgs, Spain, & Fairhall, 2008). However, there has typically been less emphasis placed on understanding temporal differentiation in comparison to other temporal operators, such as integration (Tripp & Eliasmith, 2010).

Finally, we again note that these results translate directly to the discrete-time domain. The required state-space matrices are the same as in equation 4.12, but with corresponding to the coefficients of , bars affixed to each state-space matrix, and substituted for in . However, the operator is a shift backward by time steps (i.e., an acausal lookahead), and so the ZOH assumption is instead for all . Thus, the discrete analog to equation 4.13 is
formula
4.14

5  Results

We are now in a position to bring together the two main themes of this work: implementing delays and extending the NEF to employ a wide variety of synapse models. To do so, we provide two example applications of our theory. First, we demonstrate that we can map the delay system onto a variety of synapse models used in practice. We show that including an axonal transmission delay in the synapse model significantly improves the ability of the network to compute continuous-time delays (and, by extension, any system that can be expressed in terms of delayed signals). Second, we explore intriguing similarities between our delay network responses and the recently discovered time cells (Eichenbaum, 2014; Tiganj, Hasselmo, & Howard, 2015), suggesting that this theory may provide a new understanding of these observed temporal responses in hippocampus, stratium, medial prefrontal cortex (mPFC), and elsewhere in cortex (Mello, Soares, & Paton, 2015; Luczak, McNaughton, & Harris, 2015).

5.1  Methods

All networks were built and simulated using Nengo 2.4.0 (Bekolay et al., 2013), a Python tool for simulating large-scale neural networks, with the nengolib 0.4.1 extension (Voelker, 2017; Voelker & Eliasmith, 2017). Simulations were run on a standard desktop computer10 using Python 2.7.3, Numpy 1.11.3, and SciPy 0.18.0rc1, linked with the Intel Math Kernel Library (MKL).

All simulations used a 1 ms time step (with exception to Figure 9, which used a time step of 0.01 ms) and Nengo's default configurations for heterogeneous tuning curves and spiking LIF parameters. As detailed in section A.3, encoders were either sampled from the unit-axis vectors or scattered uniformly along the surface of the hypersphere using quasi–Monte Carlo methods. We further normalized the delay systems using balanced realizations and Hankel singular values.

The architecture of each network follows the setup described by the NEF in section 2. A tutorial by Sharma, Aubin, and Eliasmith (2016) provides some additional details regarding the software application of NEF methods. Figures were created using Seaborn (Waskom et al., 2015) and are reproducible via https://github.com/arvoelke/delay2017.

5.2  Delay Networks with Higher-Order Synapses

We begin by making the practical point that it is crucial to account for the effect of the simulation time step in digital simulations if the time step is not sufficiently small relative to the timescale of the desired network-level dynamics. To demonstrate this, we simulate a 27-dimensional delay network using 1000 spiking LIF neurons, implementing a 0.1 s delay of 50 Hz band-limited white noise. We vary the simulation time step () from 0.1 ms to 2 ms. The accuracy of our extension does not depend on (see Figure 8, left). When ms (the default in Nengo), the standard principle 3 mapping (see equation 4.6) obtains an NRMSE of 1.425 ( worse than random chance) versus 0.387 for the discrete low-pass mapping that accounts for (see equation 4.7), a reduction in error. As approaches 0, the two methods become equivalent.

Figure 8:

Comparing standard principle 3 to our NEF extensions. (Left) Error from mapping a 27-dimensional 0.1 s delay onto 1000 spiking LIF neurons while varying the simulation time step (). The input to the network is white noise with a cutoff frequency of 50 Hz. Unlike our extension, the standard form of principle 3 does not account for . A dashed vertical line indicates the default time step in Nengo. Error bars indicate a confidence interval bootstrapped across 25 trials. (Right) Mapping the delay system onto a delayed continuous low-pass synapse (with parameters and ). The order of the delay system () is varied from 6 (lightest) to 27 (darkest). Each line evaluates the error in the frequency response, , where is determined by mapping the delay of order onto equation 4.8 using one of the two following methods. The method of our extension, which accounts for the axonal transmission delay, has a monotonically increasing error that stabilizes at 1 (i.e., the high frequencies are filtered). The standard principle 3, which accounts for but ignores , alternates between phases of instability and stability as the frequency is increased.

Figure 8:

Comparing standard principle 3 to our NEF extensions. (Left) Error from mapping a 27-dimensional 0.1 s delay onto 1000 spiking LIF neurons while varying the simulation time step (). The input to the network is white noise with a cutoff frequency of 50 Hz. Unlike our extension, the standard form of principle 3 does not account for . A dashed vertical line indicates the default time step in Nengo. Error bars indicate a confidence interval bootstrapped across 25 trials. (Right) Mapping the delay system onto a delayed continuous low-pass synapse (with parameters and ). The order of the delay system () is varied from 6 (lightest) to 27 (darkest). Each line evaluates the error in the frequency response, , where is determined by mapping the delay of order onto equation 4.8 using one of the two following methods. The method of our extension, which accounts for the axonal transmission delay, has a monotonically increasing error that stabilizes at 1 (i.e., the high frequencies are filtered). The standard principle 3, which accounts for but ignores , alternates between phases of instability and stability as the frequency is increased.

More to the point, we can analyze the delay network's frequency response when using a delayed continuous low-pass synapse (see equation 4.8) instead of the canonical low-pass (see equation 2.8) as the dynamical primitive. This provides a direct measure of the possible improvement gains when using the extension. Figure 8, right compares the use of principle 3 (which accounts for but ignores to our extension (which fully accounts for both; see section 4.3) when . The figure reveals that increasing the dimensionality improves the accuracy of our extension while magnifying the error from principle 3. In the worst case, the principle 3 mapping has an absolute error of nearly . In practice, saturation from the neuron model bounds this error by the maximum firing rates. Regardless, it is clearly crucial to account for axonal transmission delays to accurately characterize the network-level dynamics.

To more broadly validate our NEF extensions from section 4, we map the delay system onto (1) a continuous low-pass synapse (see section 4.1), (2) a delayed continuous low-pass synapse (see section 4.3), and (3) a continuous double exponential synapse (see section 4.4). We apply each extension to construct delay networks of 2000 spiking LIF neurons. To compare the accuracy of each mapping, we make the time step sufficiently small (s) to emulate a continuous-time setting. We use the Padé approximants of order for both equations 3.4 and 4.10. For the delayed low-pass, we again fix and . For the double exponential, we fix and . Expressing these parameters as dimensionless constants keeps our results scale invariant with .

Figure 9 reveals that axonal delays may be effectively amplified 10-fold while reducing the NRMSE by compared to the low-pass (see Figure 9, right; NRMSE for low-pass = 0.702, delayed low-pass = 0.205, and double exponential = 0.541). The double exponential synapse outperforms the low-pass, despite the additional poles introduced by the ZOH assumption in equation 4.13 (see section A.4 for analysis). This is because the double exponential filters the spike noise twice. Likewise, by exploiting an axonal delay, the same level of performance (e.g., error) may be achieved at approximately 1.5 times higher frequencies, or equivalently for 1.5 times longer network delays, when compared to the low-pass synapse (see Figure 9, left). In summary, accounting for higher-order synaptic properties allows us to harness the axonal transmission delay to more accurately approximate network-level delays in spiking dynamical networks.

Figure 9:

The pure delay mapped onto spiking networks with various synapse models (with parameters , , , , and ). (Left) Error of each mapping in the frequency domain. This subfigure is scale invariant with . (Right) Example simulation when s and the input signal is white noise with a cutoff frequency of 15 Hz, corresponding to the triangle (over 1.5) from the left subfigure. We use a time step of 0.01 ms (s) and 2000 spiking LIF neurons.

Figure 9:

The pure delay mapped onto spiking networks with various synapse models (with parameters , , , , and ). (Left) Error of each mapping in the frequency domain. This subfigure is scale invariant with . (Right) Example simulation when s and the input signal is white noise with a cutoff frequency of 15 Hz, corresponding to the triangle (over 1.5) from the left subfigure. We use a time step of 0.01 ms (s) and 2000 spiking LIF neurons.

Together, these results demonstrate that our extensions can significantly improve the accuracy of high-level network dynamics. Having demonstrated this for delays, in particular, suggests that the extension is useful for a wide variety of biologically relevant networks. To make this point more concrete, we turn to a consideration of time cells.

5.3  Time Cells

We now describe a connection between the delay network from section 3 and recent neural evidence regarding time cells. Time cells were initially discovered in the hippocampus and proposed as temporal analogs of the more familiar place cells (Eichenbaum, 2014). Similar patterns of neural activity have since been found throughout striatum (Mello et al., 2015) and cortex (Luczak et al., 2015) and have been extensively studied in the rodent mPFC (Kim, Ghim, Lee, & Jung, 2013; Tiganj, Jung, Kim, & Howard, 2016).

Interestingly, we find that our delay network produces qualitatively similar neural responses to those observed in time cells. This is shown in Figure 10, by comparing neural recordings from mPFC (Tiganj et al., 2016, Figures 4C and 4D) to the spiking activity from a network implementing a delay of the same length used in the original experiments. Specifically, in this network, a random population of 300 spiking LIF neurons maps a 4.784 s delay onto an alpha synapse ( s) using our extension. The order of the approximation is (see equation 3.5), and the input signal is a rectangular pulse beginning at s and ending at s (height ). The simulation is started at s and stopped at s.

Figure 10:

Comparison of time cells to an NEF delay network. Top: Spiking activity from the rodent mPFC (reproduced from Tiganj et al., 2016, Figures 4C and 4D, by permission of Oxford University Press). Neural recordings were taken during a maze task involving a delay period of 4.784 s. Bottom: Delay network implemented using the NEF (see text for details). Seventy-three time cells are selected by uniformly sampling encoders from the surface of the hypersphere. (A) Cosine similarity between the activity vectors for every pair of time points. The diagonal is normalized to the warmest color. The similarity spreads out over time. (B) Neural activity sorted by the time to peak activation. Each row is normalized between 0 (cold) and 1 (warm). We overlay the curve from Figure 6, bottom () to model the peak response times.

Figure 10:

Comparison of time cells to an NEF delay network. Top: Spiking activity from the rodent mPFC (reproduced from Tiganj et al., 2016, Figures 4C and 4D, by permission of Oxford University Press). Neural recordings were taken during a maze task involving a delay period of 4.784 s. Bottom: Delay network implemented using the NEF (see text for details). Seventy-three time cells are selected by uniformly sampling encoders from the surface of the hypersphere. (A) Cosine similarity between the activity vectors for every pair of time points. The diagonal is normalized to the warmest color. The similarity spreads out over time. (B) Neural activity sorted by the time to peak activation. Each row is normalized between 0 (cold) and 1 (warm). We overlay the curve from Figure 6, bottom () to model the peak response times.

We also note a qualitative fit between the length curve for in Figure 6 and the peak response times in Figure 10. Specifically, Figure 6 (bottom) models the nonuniform distribution of the peak response time of the cells as the length of the trajectory of through time. Implicit in this model are the simplifying assumptions that encoders are uniformly distributed and that the L2-norm of the state vector remains constant throughout the delay period. Nevertheless, this model produces a qualitatively similar curve when to both peak response times from Figure 10 (right; see overlay).

More quantitatively, we performed the same analysis on our simulated neural activity as Tiganj et al. (2016) performed on the biological data to capture the relationship between the peak and width of each time cell. Specifically, we fit the spiking activity of each neuron with a gaussian to model the peak time () and the standard deviation () of each cell's “time field.”11 This fit was repeated for each of the 250 simulated spiking LIF neurons that remained after selecting only those that had at least of their spikes occur within the delay interval. The correlation between and had a Pearson's coefficient of 0.68 (), compared to 0.52 () for the biological time cells. An ordinary linear regression model linking (independent variable) with (dependent variable) resulted in an intercept of (standard error) and a slope of for our simulated data, compared to and , respectively, for the time cell data. We note that we used the same bin size of 1 ms, modeled the same delay length, and did not perform any parameter fitting beyond the informal choices of cutoff, dimensionality (), area of the input signal (1.5), and synaptic time constant ( s).

Neural mechanisms previously proposed to account for time cell responses have either been speculative (Tiganj et al., 2016) or rely on gradually changing firing rates from a bank of arbitrarily long, ideally spaced low-pass filters (Shankar & Howard, 2012; Howard et al., 2014; Tiganj et al., 2015; Tiganj, Shankar, & Howard, 2017). It is unclear if such methods can be implemented accurately and scalably using heterogeneous spiking neurons. We suspect that robust implementation is unlikely given the high precision typically relied on in these abstract models.

In contrast, our proposed spiking model has its network-level dynamics derived from first principles to optimally retain information throughout the delay interval, without relying on a particular synapse model or bank of filters. All of the neurons recurrently work together in a low-dimensional vector space to make efficient use of neural resources. By using the methods of the NEF, this solution is inherently robust to spiking noise and other sources of uncertainty. Furthermore, our explanation accounts for the nonlinear distribution of peak firing times as well as its linear correlation with the spread of time fields.

The observation of time cells across many cortical and subcortical areas suggests that the same neural mechanisms may be used in many circuits throughout the brain. As a result, the neural activity implicated in a variety of delay tasks may be the result of many networks optimizing a similar problem to that of delaying low-frequency signals recurrently along a low-dimensional manifold. Such networks would thus be participating in the temporal coding of a stimulus by representing its history across a delay interval.

6  Conclusion

We have discussed two main theoretical results. The first provides a method for accurately implementing continuous-time delays in recurrent spiking neural networks. This begins with a model description of the delay system and ends with a finite-dimensional representation of the input's history that is mapped onto the dynamics of the synapse. The second provides a method for harnessing a broad class of synapse models in spiking neural networks while improving the accuracy of such networks compared to standard NEF implementations. These extensions are validated in the context of the delay network.

Our extensions to the NEF significantly enhance the framework in two ways. First, it allows those deploying the NEF on neuromorphics to improve the accuracy of their systems given the higher-order dynamics of mixed-analog-digital synapses (Voelker, Benjamin et al., 2017; Voelker & Eliasmith, 2017). Second, it advances our understanding of the effects of additional biological constraints, including finite rise times and pure time delays due to action potential propagation. Not only can these more sophisticated synapse models be accounted for, but they may be harnessed to directly improve the network-level performance of certain systems.

We exploited this extension to show that it can improve the accuracy of discrete-time simulations of continuous neural dynamics. We also demonstrated that it can provide accurate implementations of delay networks with a variety of synapse models, allowing systematic exploration of the relationship between synapse- and network-level dynamics. Finally, we suggested that these methods provide new insights into the observed temporal properties of individual cell activity. Specifically we showed that time cell responses during a delay task are well approximated by a delay network constructed using these methods. This same delay network nonlinearly encodes the history of an input stimulus across the delay interval (i.e., a rolling window) by compressing it into a -dimensional state, with length scaling as , where is the input frequency.

While we have focused our attention on delay networks in particular, our framework applies to any linear time-invariant system. As well, though we have not shown it here, as with the original NEF formulation, these methods also apply to nonlinear systems. As a result, these methods characterize a very broad class of combinations of synapse- and network-level spiking dynamical neural networks.

Many important questions still remain concerning the interactions of principles 1, 2, and 3. While the error in our transformations scale as due to independent spiking, it has been shown that near-instantaneous feedback may be used to collaboratively distribute these spikes and scale the error as (Boerlin, Machens, & Denève, 2013; Thalmeier, Uhlmann, Kappen, & Memmesheimer, 2016). This reduction in error has potentially dramatic consequences for the efficiency and scalability of neuromorphics by reducing total spike traffic (Boahen, 2017). However, it is currently unclear whether this approach can be applied to a more biologically plausible setting (e.g., using neurons with refractory periods) while retaining this linear scaling property. Similarly, we wish to characterize the network-level effects of spike-rate adaptation, especially at higher input frequencies, in order to understand the computations that are most accurately supported by more detailed neuron models. This will likely involve extending our work to account for nonlinear dynamical primitives and subsequently harness their effects (e.g., bifurcations) to improve certain classes of computations.

Appendix:  Supplementary Derivations

A.1  Normalized State-Space Delay

In this appendix, we symbolically transform equation 3.4 into a normalized state-space model that avoids the need to compute any factorials. We first do so for the special case of , since this provides the best approximation to the step response (Vajta, 2000). We begin by expanding equation 3.4:
formula
where and .
This transfer function is readily converted into a state-space model in controllable canonical form:
formula
To eliminate the factorials in and , we scale the th dimension of the state vector by , for all . This is achieved without changing the transfer function by scaling each by , each by , and each by , which yields the equivalent state-space model:
formula
where and for . This follows from noting that and for .
A similar derivation applies to the case where , although it results in a pass-through () that is suboptimal for step responses. For brevity, we omit this derivation and instead simply state the result:
formula
where , for .

In either case, and depend on the delay length solely by the scalar factor . As a result, we may control the length of the delay by adjusting the gain on the input and feedback signals. The NEF can be used to build such controlled dynamical systems without introducing multiplicative dendritic interactions or implausible on-the-fly connection weight scaling (Eliasmith & Anderson, 2000). The identification of this control factor is connected to a more general property of the Laplace transform, for all , that we can exploit to modulate the width of any filter on-the-fly (in this case affecting the amount of delay; results not shown).

A.2  Decoding Separate Delays from the Same Network

Although the delay network has its dynamics optimized for a single delay , we can still accurately decode any delay from the same network. This means that the network is representing a rolling window (i.e., history) of length . This window forms a temporal code of the input stimulus.

To compute these other delays, we optimally approximate with a transfer function of order , such that the denominator (which provides us with the recurrent transformation up to a change of basis) depends only on , while the numerator (which provides us with the output transformation up to a change of basis) depends on some relationship between and .

From equation 3.4, we may write the denominator as
formula
We then solve for the numerator as follows:
formula
By expanding this product and collecting similar terms, the correct numerator up to order is
formula
Therefore, the optimal readout for a delay of length , given the dynamics for a delay of length , is determined by the above linear transformation of the coefficients .
We remark that , since , by uniqueness of the Padé approximants and by equation 3.4. As a corollary, we have proved that the following combinatorial identity holds for all , and :
formula
For the case when , we may also apply the same state-space transformation from section A.1 to obtain the normalized coefficients for the transformation (i.e., with , , and from equation 3.5):
formula

A.3  Remarks on Choice of State-Space Model

Although the transfer function is in fact a unique description of the input-output behavior of an LTI system, the state-space model (equation 2.7) is not. In general, one may consider any invertible matrix with the same shape as and observe that the state-space model has the same transfer function as . Thus, the state-space model is unique only up to a change of basis. However, in the NEF, the basis may be “absorbed” into the representation of by using the encoders in principle 1, which in turn results in the decoders from principle 2. In other words, considering an alternative state-space model is equivalent to considering a change of basis for the representation.

In practice, when aiming to accurately represent using few neurons, it is important to balance the relative range of values within each dimension, such that a typical trajectory for stays within the space represented by the distribution of encoders, consistent with the samples of (see equation 2.5), and the dynamic range of each neuron. We balance the range of values by numerically computing the that results in a “balanced realization” of (Laub, Heath, Paige, & Ward, 1987; Perev, 2011). We then set the encoders to be unit length and axis aligned and optimize each dimension independently by using the methods from section 2.2. As mentioned in the main text, we occasionally include encoders that are scattered uniformly along the surface of the hypersphere using quasi–Monte Carlo sampling—specifically, using the inverse transform method applied to the Sobol sequence with a spherical coordinate transform (Fang & Wang, 1994; Knight et al., 2016)—to visualize a distributed representation. Finally, we occasionally scale each dimension by a diagonal transformation with the th diagonal equaling where is obtained by simulating the desired system directly on a randomly sampled input. We also experimented with a diagonal transformation with the th diagonal corresponding to the reciprocal of two times the sum of the Hankel singular values (Glover & Partington, 1987) of the subsystem corresponding to . This has the effect of bounding the absolute value of each dimension above by 1 in the worst case (Khaisongkram & Banjerdpongchai, 2007). We have found that these methods of normalizing state-space models typically improve the robustness of our networks across a wide range of parameters.

It is worth noting that the mappings from section 4—with the exception of equations 4.10 and 4.13—do not alter the representation of . Disregarding these exceptions, the same choice of basis is conveniently carried over to the implemented network. Yet it is also the case that the dynamics of the system mapped by equation 4.13 do not depend on the chosen state-space model. This fact is proven implicitly in the following section by characterizing the dynamics in terms of and alone.

A.4  Analysis of Poles Resulting from Equation 4.13

Consider the determined by equation 4.13, when the derivatives of the input signal are inaccessible. Let be the dynamics of the implemented system for this particular . Due to the ZOH assumption, in general, and so does not technically map onto (see definition 1). As discussed in section 4.4, the approximation satisfies equation 4.1 only when the input is held constant. To be clear, for but not necessarily for . Thus, it is important to characterize the difference between and for general inputs.

To do so, we can examine the poles of the transfer function , which are defined as the complex roots of . The poles of a system fully define the dynamics of its state (up to a change of basis). For instance, a system is exponentially stable if and only if for all poles . Furthermore, if and only if is a pole, where denotes the eigenvalues of the state-space matrix .12 Therefore, we may characterize the poles of in terms of the poles of , in order to understand the behavior of the implemented system.

We begin by deriving the poles of , recalling that . Let be an eigenvector of with eigenvalue , so that
formula
Hence, is the full set of eigenvalues for . This is also true for equation 4.12 since is identical, but we do not need this fact.
The denominator of may now be written as . Therefore, the poles of are the roots of the polynomials:
formula
A trivial set of roots is , and thus each pole of is also a pole of , as desired. However, for a synapse of order , there will also be additional poles for every pole of . For this system to behave as given low-frequency inputs, we must have the old poles dominate the new poles. That is, we require for all .
To provide a specific example, let us consider the double exponential synapse, ,
formula
In this instance, the case gives back the known poles, , and the case provides the new poles, . Consequently, the poles of are the poles of , duplicated and reflected horizontally about the real line , where (and the imaginary components are unchanged).

Interestingly, may be rewritten as , which is the ratio of the arithmetic mean to the geometric mean of , which may in turn be interpreted as a cross-entropy expression (Woodhouse, 2001).

Regardless, behaves like when . For the delay system (see equation 3.5), (in fact, the mean value across all poles achieves equality), and so we need . This predicts that a delay system using the double exponential synapse, without access to the input's derivative, must necessarily implement a delay that is proportionally longer than . Otherwise, the second-order dynamics from the synapse will dominate the system-level dynamics. We note that is longest for the case of the alpha synapse () and shortest for the case of the low-pass synapse ( as ).

A.5  Relationship to Discretization

As an aside, there is an interesting connection between definition 1 and the well-known problem of discretizing a linear dynamical system. A discrete LTI system (see equation 4.3) is identical to a continuous LTI system (see equation 2.7), with three adjustments: (1) the integrator is replaced by a