## Abstract

Researchers building spiking neural networks face the challenge of improving the biological plausibility of their model networks while maintaining the ability to quantitatively characterize network behavior. In this work, we extend the theory behind the neural engineering framework (NEF), a method of building spiking dynamical networks, to permit the use of a broad class of synapse models while maintaining prescribed dynamics up to a given order. This theory improves our understanding of how low-level synaptic properties alter the accuracy of high-level computations in spiking dynamical networks. For completeness, we provide characterizations for both continuous-time (i.e., analog) and discrete-time (i.e., digital) simulations. We demonstrate the utility of these extensions by mapping an optimal delay line onto various spiking dynamical networks using higher-order models of the synapse. We show that these networks nonlinearly encode rolling windows of input history, using a scale invariant representation, with accuracy depending on the frequency content of the input signal. Finally, we reveal that these methods provide a novel explanation of time cell responses during a delay task, which have been observed throughout hippocampus, striatum, and cortex.

## 1 Introduction

One of the central challenges in computational neuroscience is understanding how dynamic stimuli can be processed by neural mechanisms to drive behavior. Recurrent connections, cellular responses, and synaptic responses are ubiquitous sources of dynamics throughout the mammalian brain that must work in concert to support dynamic information processing (Kandel, Schwartz, & Jessell, 2000). How these low-level mechanisms interact in order to encode information about the history of a stimulus, across time, is the subject of considerable study. One approach to better understanding these mechanisms is to construct models that capture central features of neural dynamics while implementing higher-level information processing.

The neural engineering framework (NEF; Eliasmith & Anderson, 1999, 2003) proposes a method to model such dynamical systems in networks of spiking neurons (see Abbott, DePasquale, & Memmesheimer, 2016; Denève & Machens, 2016, for reviews of other methods). The NEF has been used to construct a wide variety of neural models, including a 2.3 million neuron functioning model of the human brain, capable of performing perceptual, motor, and cognitive tasks (Eliasmith et al., 2012). This model incorporates many kinds of observed neural dynamics, including oscillations, sustained activity, and point attractor dynamics. The flexibility of the NEF has led to its deployment on mixed-analog-digital neuromorphic chips (Choudhary et al., 2012; Corradi, Eliasmith, & Indiveri, 2014; Voelker, Benjamin, Stewart, Boahen, & Eliasmith, 2017; Voelker & Eliasmith, 2017) and digital architectures (Bekolay et al., 2013; Wang, Hamilton, Tapson, & van Schaik, 2014; Mundy, Knight, Stewart, & Furber, 2015; Berzish, Eliasmith, & Tripp, 2016). Consequently, the NEF provides a practical method for programming neuromorphics, thus helping the field realize its promise of a low-energy computing platform that emulates core principles of the nervous system (Boahen, 2017).

However, the NEF typically assumes an exponential model of the postsynaptic current (PSC) evoked by an action potential, which has a biologically implausible, instantaneous rise time. This model is also known as a first-order low-pass filter. In contrast, the synapse models used in mixed-analog-digital neuromorphic chips are nonideal, featuring higher-order dynamics due to parasitic capacitances (Voelker, Benjamin et al., 2017). Similarly, the synapse models commonly used in biological models incorporate distinct rise and fall times due to separate timescales of transmitter binding and unbinding, as well as axonal transmission delays due to the finite-velocity propagation of action potentials (Roth & van Rossum, 2009). To widen the scope of the NEF, we characterize the network-level effects of these higher-order synapse models and harness them to implement certain classes of dynamical systems with improved accuracy.

A particularly important dynamical system that has not been implemented using the NEF is the pure continuous-time delay line. This system must represent a rolling window of input history. We provide a novel derivation of an optimal low-dimensional linear approximation to a continuous-time delay and prove that the resulting delay network nonlinearly encodes its input across the delay interval. This network uses a scale-invariant representation, with a level of accuracy that depends on the input frequency, chosen dimensionality (i.e., the order of the approximation), and particular synapse model. Low-dimensional representations (e.g., ) of low-frequency signals (e.g., Hz) are pervasive in biological systems (Cunningham & Byron, 2014; Waernberg & Kumar, 2017; Pulvermüller, Birbaumer, Lutzenberger, & Mohr, 1997; Singer, 1999). To our knowledge, this work is the first to demonstrate that such a temporal code may be accurately implemented using a spiking dynamical network.

Reservoir computing approaches, such as liquid state machines (Maass, Natschläger, & Markram, 2002) and echo state networks (Jaeger, 2001), may be used to approximate a delay line. However, since these networks use randomly chosen feedback weights, it is likely that they do not efficiently produce the dynamics of a pure delay. Rather, these networks represent a random variety of nonlinear memory traces (Lukoševičius, 2012). Discrete approaches to short-term memory, such as those taken by White, Lee, and Sompolinsky (2004) and Ganguli, Huh, and Sompolinsky (2008), while optimal in an information-theoretic sense, rely fundamentally on single time-step delays between rate-based neurons. In contrast, the method that we propose here works independently of the simulation time step, and is optimal assuming the population of spiking neurons—coupled with some model of the synapse—accurately represents a low-dimensional, low-frequency vector space. Furthermore, our framework is extended to account for arbitrary linear synapses, which improves our understanding of the relationship between synapse models and network-level computation. A detailed comparison of our method to reservoir computing remains the subject of future work.

A distinguishing feature of this work is that we begin the problem with a mathematical description of the ideal dynamics for the delay line and then proceed by mapping this description onto a spiking neural network (using the NEF and its extensions). This is in contrast to methods that use online learning or backpropagation through time with rate-based neurons (De Vries & Principe, 1992; Sussillo & Abbott, 2009) or spiking neurons (Nicola & Clopath, 2016; Huh & Sejnowski, 2017; Gilra & Gerstner, 2017; Alemi, Machens, Denève, & Slotine, 2017). That is, we do not require any sophisticated training regimes involving online learning or backpropagation; our delay network is trained offline using convex optimization (i.e., regularized least squares), which yields a more complete understanding of its employed representation and dynamics.

The remainder of this article is structured as follows. In section 2, we introduce the NEF, placing an emphasis on the mapping of continuous-time linear systems onto the canonical low-pass model of the synapse. In section 3.1, we use the NEF to derive a spiking dynamical network that delays its input signal by some fixed length of time and then analyze this network in section 3.2 to characterize how it nonlinearly encodes the stimulus across a rolling window. In section 4, we extend the NEF to characterize the effects of higher-order synapse models and account for both continuous-time (i.e., for analog hardware) and discrete-time (i.e., for digital hardware) simulations. In section 5, we exploit the methods from section 4 to demonstrate their utility in computing delays and provide insights into possible neural mechanisms. Specifically, we harness a small delay in the synapse model to improve the accuracy of a larger network-level delay. And we illustrate the relevance of our delay network to biological modeling through qualitative and quantitative comparisons to the responses of time cells (Eichenbaum, 2014), suggesting a new characterization of how temporal representations may arise in the brain.

## 2 Neural Engineering Framework

The neural engineering framework (NEF; Eliasmith & Anderson, 1999, 2003) consists of three mathematical principles used to describe neural computation. The NEF is most commonly applied to building dynamic (i.e., recurrent) spiking neural networks, but also applies to nonspiking and feedforward networks. Its primary strength lies in providing a well-understood and efficient means of engineering spiking neural models, and programming neuromorphic hardware, to perform mathematically described computations (Eliasmith, 2013; Boahen, 2017). In this section, we provide an overview of these methods applied to training both feedforward and recurrent connection weights in order to implement linear dynamical systems, although these methods extend to nonlinear dynamical systems as well (Voelker, Benjamin et al., 2017; Voelker & Eliasmith, 2017). We present this framework in a manner that is consistent with the Nengo 2.4.0 simulator (Bekolay et al., 2013), which implements the NEF among other approaches.

### 2.1 Principle 1: Representation

Let denote a -dimensional time-varying signal that is to be represented by a population of spiking neurons. To describe this representation, we define a nonlinear encoding and a linear decoding that together determine how neural activity relates to the represented vector.

Having defined an encoding, we introduce a postsynaptic filter , which acts as the synapse model by capturing the dynamics of a receiving neuron's synapse. In particular, this filter models the postsynaptic current (PSC) triggered by action potentials arriving at the synaptic cleft. For now, we fix to be an exponentially decaying PSC with time constant , which is equivalent (in the Laplace domain) to the canonical first-order low-pass filter (also known as a leaky integrator). This is the conventional choice of synapse in the NEF, since it strikes a convenient balance between mathematical simplicity and biological plausibility (Eliasmith & Anderson, 2003). In section 4, we return to this point by considering more general synapse models that are capable of capturing a much broader variety of PSCs.

^{1}To complete the characterization of neural representation, we solve for the optimal linear decoders . This optimization is identical for principles 1 and 2, as discussed below.

### 2.2 Principle 2: Transformation

^{2}by applying an alternate matrix of decoders to the same activities: For both principles 1 and 2, we optimize for over the domain of the signal, , which is typically the unit -ball or the unit -cube . To determine these decoders, we first let be the limiting average firing rate of the th neuron under the constant input : For nonadaptive neuron models, equation 2.4 reduces to encoding using a rate model. For adaptive neuron models, other definitions for may be considered, but we limit our discussion here to the (nonadaptive) spiking LIF model. To account for the variance introduced by neural spiking and other sources of uncertainty, we introduce a white noise term . The optimality criterion for is therefore Note that this optimization depends on only for , as opposed to depending on the signal . In other words, the optimization is determined strictly by the distribution of the signal, and not according to its particular dynamics. Furthermore, this is a convex optimization problem, which may be solved by uniformly sampling (Voelker, Gosmann, & Stewart, 2017) and applying a standard regularized least-squares solver to the sampled data (Bekolay et al., 2013). Monte Carlo sampling introduces error into the integral from equation 2.5, where is the number of samples, but this can be improved to —effectively squaring —by the use of quasi–Monte Carlo methods (Fang & Wang, 1994; Knight, Voelker, Mundy, Eliasmith, & Furber, 2016). Nengo also supports alternative decoder solvers that optimize variations of equation 2.5 (e.g., Friedl et al., 2016; Kauderer-Abrams et al., 2017), but we do not use them here.

The accuracy of this approach relies on being a suitable proxy for whenever . This zeroth-order approximation clearly holds in the steady state for constant and turns out to be ideal in practice for low-frequency (Eliasmith & Anderson, 2003, appendix F.1), and likewise for that filter out high frequencies (i.e., when the synaptic time-constant is large).

This factorization has important consequences for the computational efficiency of neural simulations. The crucial difference between the factorized and nonfactorized forms is that it takes operations per simulation time step to implement this dot product in the factored form of equation 2.6, as opposed to operations for a full-rank weight matrix. Since is typically held constant, this yields a factor improvement in simulation time. Similarly, this factorization yields an reduction in memory, which significantly improves the scaling of neuromorphics (Mundy et al., 2015). In essence, this factorization provides a way to describe the network's latent state vector . This, in turn, permits us to perform useful computations by transforming the state vector with error in the presence of spike-noise (Eliasmith & Anderson, 2003).

### 2.3 Principle 3: Dynamics

^{3}To be more precise, our approach is to represent the state vector in a population of spiking neurons (principle 1; see equation 2.1) such that this vector is obtained by filtering some linearly decoded spike trains with a leaky integrator (principle 2; see equation 2.3). Thus, the goal of principle 3 is to determine the transformations required to implement equation 2.7, given that is obtained by some convolution with a leaky integrator rather than the perfect integrator depicted in Figure 1.

Principle 3 states that in order to implement equation 2.7 in a population of neurons that represents , we must compensate for the effect of “replacing” the integrator with a leaky integrator (compare Figures 1 and 2), that is, by driving the synapse with instead of only . This compensation is achieved as follows: implement the recurrent transformation and the input transformation , but use the same output transformation , and the same pass-through transformation (Eliasmith & Anderson, 2003). Specifically, this may be implemented in a spiking dynamical network by representing via principle 1 and then using principle 2 to decode the needed transformations. The resulting model is summarized in Figure 2.

The correctness of this “mapping” procedure relies on three assumptions: (1) the synapse model is equation 2.8, (2) the network is simulated in continuous time (or the discrete time step is sufficiently small), and (3) the necessary representations and transformations are sufficiently accurate such that the approximation error from equation 2.3 is negligible. In other words, assuming is sufficiently large, the architecture of Figure 2 is equivalent to Figure 1, but using the leaky integrator instead of an integrator as the dynamical primitive (Eliasmith & Anderson, 2003). Consequently, both systems compute the exact same signals and . In section 4.1, we provide a novel proof of this equivalence. In sections 4.2 to 4.4, we extend principle 3 to remove the first and second assumptions.

Principle 3 is useful for accurately implementing a wide class of dynamical systems (e.g., integrators, oscillators, attractor networks) to solve specific problems that frequently arise in neural modeling (e.g., Eliasmith & Anderson, 2000; Singh & Eliasmith, 2004, 2006; Eliasmith, 2005). Furthermore, the class of state-space models is isomorphic to the class of all finite-dimensional causal linear filters or, equivalently, all rational (finite-order) proper transfer functions, which is a large and useful class of dynamical systems employed widely in control applications (Brogan, 1991). Given the ability of principle 2 to compute nonlinear functions (see equation 2.3), principle 3 also naturally generalizes to nonlinear dynamical systems, but this is outside the scope of this work; we refer readers to Voelker, Benjamin et al. (2017) and Voelker and Eliasmith (2017) for recent nonlinear extensions.

## 3 Spiking Delay Networks

^{4}expressed as where denotes a Dirac delta function shifted backward in time by . This system takes a time-varying scalar signal, , and outputs a purely delayed version, . The task of computing this function both accurately and efficiently in a biologically plausible, spiking, dynamical network is a significant theoretical challenge that, to our knowledge, has previously remained unsolved.

The continuous-time delay is worthy of detailed consideration for several reasons. First, it is nontrivial to implement using continuous-time spiking dynamical primitives. Specifically, equation 3.1 requires that we maintain a rolling window of length (i.e., the history of , going seconds back in time). Thus, computing a delay of seconds is just as hard as computing every delay of length for all . Since any finite interval of contains an uncountably infinite number of points, an exact solution for arbitrary requires that we maintain an uncountably infinite amount of information in memory. Second, the delay provides us with a window of input history from which to compute arbitrary nonlinear functions across time. For instance, the spectrogram of a signal may be computed by a nonlinear combination of delays, as may any finite impulse response (FIR) filter. Third, delays introduce a rich set of interesting dynamics into large-scale neural models, including oscillatory bumps, traveling waves, lurching waves, standing waves, aperiodic regimes, and regimes of multistability (Roxin, Brunel, & Hansel, 2005). Fourth, a delay line can be coupled with a single nonlinearity to construct a network displaying many of the same benefits as reservoir computing (Appeltant et al., 2011). Finally, examining the specific case of the continuous-time delay introduces several methods and concepts we employ in generally extending the NEF (see section 4).

### 3.1 Implementation

After choosing , we may numerically find a state-space model that satisfies equation 3.3 using standard methods^{5} and then map this system onto the synapse using principle 3. However, naively applying this conversion leads to numerical issues in the representation (i.e., dimensions that grow exponentially in magnitude) due in part to the factorials in equation 3.4.

^{6}The choice of corresponds to the dimensionality of the latent state vector that is to be represented by principle 1 and transformed by principle 2. Principle 3 may then be used to map equation 3.5 onto a spiking dynamical network to accurately implement an optimal low-frequency approximation of the delay.

To demonstrate, we implement a 1 s delay of Hz band-limited white noise using 1000 recurrently connected spiking LIF neurons representing a six-dimensional vector space (see Figure 3). The connections between neurons are determined by applying principle 3 (see section 2.3) to the state-space model derived above (see equation 3.5, ) via the Padé approximants of the delay. The normalized root-mean-squared error (NRMSE) of the output signal is 0.048 (normalized so that 1.0 would correspond to random chance). This is achieved without appealing to the simulation time step ( ms); in fact, as shown in section 5.2, the network accuracy improves as approaches zero due to the continuous-time assumption mentioned in section 2.3 (and resolved in section 4.2).

### 3.2 Temporal Coding

In Figure 4, we take different linear transformations of the same state vector by evaluating equation 3.6 at various delays between 0 and to decode the rolling window of input from the state of the system.^{7} This demonstrates that the delay network compresses the input's history (lasting seconds) into a low-dimensional state.

In Figure 5, we sweep equation 3.6 across to visualize the temporal “basis functions” of the delay network. This provides a way to understand the relationship between the chosen state-space representation (i.e., the -dimensional ) and the underlying window representation (i.e., the infinite-dimensional ). In particular, each basis function corresponds to the continuous window of history represented by a single dimension of the delay network. The instantaneous value of each dimension acts as a coefficient on its basis function, contributing to the representation of the window at that time. Overall, the entire state vector determines a linear combination of these basis functions to represent the window. This is analogous to the static function representation explored previously within the context of principles 1 and 2 (Eliasmith & Anderson, 2003).

The encoder of each neuron can also be understood directly in these terms as taking a linear combination of the basis functions (via equation 2.1). Each neuron nonlinearly encodes a projection of the rolling window onto some “preferred window” determined by its own encoder. Since the state vector is encoded by heterogeneous neural nonlinearities, the population's spiking activity supports the decoding of nonlinear functions across the entire window (i.e., functions that we can compute using principles 1 and 2). Therefore, we may conceptualize the delay network as a temporal coding of the input stimulus, which constructs a low-dimensional state, representing an entire window of history, to encode the temporal structure of the stimulus into a nonlinear high-dimensional space of neural activities.

To more thoroughly characterize the delay dynamics, we analyze the behavior of the delay network as the dimensionality is increased (see Figure 6). Specifically, we perform a standard principal component analysis (PCA) on the state vector for the impulse response and vary the order from to . This allows us to visualize a subset of the state vector trajectories, via projection onto their first three principal components (see Figure 6, top). The length of this trajectory over time distinguishes different values of (see Figure 6, bottom). This length curve is approximately logarithmic when , convex when , and sigmoidal when . To generate this figure, we use a delay of s, but in fact, this analysis is scale invariant with time. This means that other delays will simply stretch or compress the impulse response linearly in time (not shown).

The delay network is scale invariant with the delay length over input frequency, that is, the accuracy for a chosen order is a function of (see the units in Figure 4, for instance). More specifically, for a fixed approximation error, the delay length scales as , where is the input frequency. Then the accuracy of the mapped delay is a function of the relative magnitude of the delay length to , whose exact shape depends on the considered synapse model. To determine these functions for a wide class of synapses, we proceed by extending the NEF.

## 4 Extending the Neural Engineering Framework

With the example of building a continuous-time delay in hand, we now proceed to extend principle 3 to arbitrary linear synapses. We focus on building a comprehensive theory for linear systems, but many of these same techniques also carry over to the case of nonlinear dynamical systems with heterogeneous synapses (Voelker, Benjamin et al., 2017; Voelker & Eliasmith, 2017).

Let be the transfer function for the linear dynamics that we wish to implement (see equations 2.7 and 3.3), and let be the transfer function for an arbitrary linear synapse model (). As stated in section 2.3, introducing the synapse model means replacing the integrator () with . This is equivalent to replacing with . Notably, substituting for the integrator results in the transfer function , which no longer implements the original, desired dynamics . However, we would like to ensure that the new dynamics match the originally specified . The key insight is that we can determine a new function, , such that . That is, we can solve for a function that provides the original dynamics when implemented using the transfer function as the dynamical primitive. This is formalized by the following definition:

This definition compactly expresses the notion of a change of dynamical primitive in that is mapped from the canonical primitive, , onto some new primitive, . Trivially, maps itself onto . Nontrivial examples are given throughout sections 4.1 to 4.4.

Therefore, supposing satisfies definition ^{1} and that it is convertible to a state-space model (see equation 2.7), then Figure 7 is just another form of Figure 1, but with the integrator replaced by the synapse. Note that this construction, on its own, does not specify how to find a satisfying or whether such a function exists, or whether it can be converted to a state-space model. We provide several examples leading to such a specification in section 4.4.

### 4.1 Continuous Low-Pass Synapse

^{1}. Therefore, by equation 4.2, This completes our novel proof of principle 3 from section 2.3.

### 4.2 Discrete Low-Pass Synapse

^{2}. Therefore, by equation 4.5, provides an exact implementation of the desired system for digital architectures, regardless of the simulation time step (assuming ZOH).

### 4.3 Delayed Continuous Low-Pass Synapse

### 4.4 General Linear Synapse

As well, in section A.5, we demonstrate how to use equation 4.11 to model synapses with pure delays and to prove a complete connection to ZOH discretization. Specifically, by the use of Padé approximants, we may rewrite any transfer function with a Taylor series expansion (even those that are irrational or improper) in the form of equation 4.11, albeit with some radius of convergence. This permits us to model, for instance, synapses with pulse-extended responses (Voelker, Benjamin et al., 2017).^{9} Similarly, the delayed low-pass, equation 4.8, may be expressed in the form of equation 4.11. Finally, any linear combinations of the aforementioned synapse models will also be a member of this class. Nonlinear synapse models, such as conductance-based synapses (e.g., Destexhe, Mainen, & Sejnowski, 1994a, equation 6), are a current subject of study in the NEF (Stöckel, Voelker, & Eliasmith, 2017).

Nevertheless, such solutions can still be implemented, as is, in practice. Since is the th-order differential operator, this form of states that we must supply the th-order input derivatives for all . When , as in section 4.1, this is trivial (no derivatives are needed). For , let us first define . Then equation 4.12 shows that the ideal state-space model must implement the input transformation as a linear combination of input derivatives, . If the derivatives of the input are available to the model, then the we described may be used to precisely implement the desired .

To our knowledge, this specific state-space architecture has not been explored in theory or in practice. Given that equation 4.12 requires the input derivatives to accurately compute low-dimensional network-level dynamics, this strongly suggests that it is important to represent and compute derivatives in neural systems. Methods of computing derivatives have been explored by Tripp and Eliasmith (2010) within the NEF. As well, it has long been suggested that adaptive neural mechanisms (e.g., synaptic depression or spike rate adaptation) may also play a fundamental role in computing such derivatives (Abbott & Regehr, 2004; Lundstrom, Higgs, Spain, & Fairhall, 2008). However, there has typically been less emphasis placed on understanding temporal differentiation in comparison to other temporal operators, such as integration (Tripp & Eliasmith, 2010).

## 5 Results

We are now in a position to bring together the two main themes of this work: implementing delays and extending the NEF to employ a wide variety of synapse models. To do so, we provide two example applications of our theory. First, we demonstrate that we can map the delay system onto a variety of synapse models used in practice. We show that including an axonal transmission delay in the synapse model significantly improves the ability of the network to compute continuous-time delays (and, by extension, any system that can be expressed in terms of delayed signals). Second, we explore intriguing similarities between our delay network responses and the recently discovered time cells (Eichenbaum, 2014; Tiganj, Hasselmo, & Howard, 2015), suggesting that this theory may provide a new understanding of these observed temporal responses in hippocampus, stratium, medial prefrontal cortex (mPFC), and elsewhere in cortex (Mello, Soares, & Paton, 2015; Luczak, McNaughton, & Harris, 2015).

### 5.1 Methods

All networks were built and simulated using Nengo 2.4.0 (Bekolay et al., 2013), a Python tool for simulating large-scale neural networks, with the nengolib 0.4.1 extension (Voelker, 2017; Voelker & Eliasmith, 2017). Simulations were run on a standard desktop computer^{10} using Python 2.7.3, Numpy 1.11.3, and SciPy 0.18.0rc1, linked with the Intel Math Kernel Library (MKL).

All simulations used a 1 ms time step (with exception to Figure 9, which used a time step of 0.01 ms) and Nengo's default configurations for heterogeneous tuning curves and spiking LIF parameters. As detailed in section A.3, encoders were either sampled from the unit-axis vectors or scattered uniformly along the surface of the hypersphere using quasi–Monte Carlo methods. We further normalized the delay systems using balanced realizations and Hankel singular values.

The architecture of each network follows the setup described by the NEF in section 2. A tutorial by Sharma, Aubin, and Eliasmith (2016) provides some additional details regarding the software application of NEF methods. Figures were created using Seaborn (Waskom et al., 2015) and are reproducible via https://github.com/arvoelke/delay2017.

### 5.2 Delay Networks with Higher-Order Synapses

We begin by making the practical point that it is crucial to account for the effect of the simulation time step in digital simulations if the time step is not sufficiently small relative to the timescale of the desired network-level dynamics. To demonstrate this, we simulate a 27-dimensional delay network using 1000 spiking LIF neurons, implementing a 0.1 s delay of 50 Hz band-limited white noise. We vary the simulation time step () from 0.1 ms to 2 ms. The accuracy of our extension does not depend on (see Figure 8, left). When ms (the default in Nengo), the standard principle 3 mapping (see equation 4.6) obtains an NRMSE of 1.425 ( worse than random chance) versus 0.387 for the discrete low-pass mapping that accounts for (see equation 4.7), a reduction in error. As approaches 0, the two methods become equivalent.

More to the point, we can analyze the delay network's frequency response when using a delayed continuous low-pass synapse (see equation 4.8) instead of the canonical low-pass (see equation 2.8) as the dynamical primitive. This provides a direct measure of the possible improvement gains when using the extension. Figure 8, right compares the use of principle 3 (which accounts for but ignores to our extension (which fully accounts for both; see section 4.3) when . The figure reveals that increasing the dimensionality improves the accuracy of our extension while magnifying the error from principle 3. In the worst case, the principle 3 mapping has an absolute error of nearly . In practice, saturation from the neuron model bounds this error by the maximum firing rates. Regardless, it is clearly crucial to account for axonal transmission delays to accurately characterize the network-level dynamics.

To more broadly validate our NEF extensions from section 4, we map the delay system onto (1) a continuous low-pass synapse (see section 4.1), (2) a delayed continuous low-pass synapse (see section 4.3), and (3) a continuous double exponential synapse (see section 4.4). We apply each extension to construct delay networks of 2000 spiking LIF neurons. To compare the accuracy of each mapping, we make the time step sufficiently small (s) to emulate a continuous-time setting. We use the Padé approximants of order for both equations 3.4 and 4.10. For the delayed low-pass, we again fix and . For the double exponential, we fix and . Expressing these parameters as dimensionless constants keeps our results scale invariant with .

Figure 9 reveals that axonal delays may be effectively amplified 10-fold while reducing the NRMSE by compared to the low-pass (see Figure 9, right; NRMSE for low-pass = 0.702, delayed low-pass = 0.205, and double exponential = 0.541). The double exponential synapse outperforms the low-pass, despite the additional poles introduced by the ZOH assumption in equation 4.13 (see section A.4 for analysis). This is because the double exponential filters the spike noise twice. Likewise, by exploiting an axonal delay, the same level of performance (e.g., error) may be achieved at approximately 1.5 times higher frequencies, or equivalently for 1.5 times longer network delays, when compared to the low-pass synapse (see Figure 9, left). In summary, accounting for higher-order synaptic properties allows us to harness the axonal transmission delay to more accurately approximate network-level delays in spiking dynamical networks.

Together, these results demonstrate that our extensions can significantly improve the accuracy of high-level network dynamics. Having demonstrated this for delays, in particular, suggests that the extension is useful for a wide variety of biologically relevant networks. To make this point more concrete, we turn to a consideration of time cells.

### 5.3 Time Cells

We now describe a connection between the delay network from section 3 and recent neural evidence regarding time cells. Time cells were initially discovered in the hippocampus and proposed as temporal analogs of the more familiar place cells (Eichenbaum, 2014). Similar patterns of neural activity have since been found throughout striatum (Mello et al., 2015) and cortex (Luczak et al., 2015) and have been extensively studied in the rodent mPFC (Kim, Ghim, Lee, & Jung, 2013; Tiganj, Jung, Kim, & Howard, 2016).

Interestingly, we find that our delay network produces qualitatively similar neural responses to those observed in time cells. This is shown in Figure 10, by comparing neural recordings from mPFC (Tiganj et al., 2016, Figures 4C and 4D) to the spiking activity from a network implementing a delay of the same length used in the original experiments. Specifically, in this network, a random population of 300 spiking LIF neurons maps a 4.784 s delay onto an alpha synapse ( s) using our extension. The order of the approximation is (see equation 3.5), and the input signal is a rectangular pulse beginning at s and ending at s (height ). The simulation is started at s and stopped at s.

We also note a qualitative fit between the length curve for in Figure 6 and the peak response times in Figure 10. Specifically, Figure 6 (bottom) models the nonuniform distribution of the peak response time of the cells as the length of the trajectory of through time. Implicit in this model are the simplifying assumptions that encoders are uniformly distributed and that the L2-norm of the state vector remains constant throughout the delay period. Nevertheless, this model produces a qualitatively similar curve when to both peak response times from Figure 10 (right; see overlay).

More quantitatively, we performed the same analysis on our simulated neural activity as Tiganj et al. (2016) performed on the biological data to capture the relationship between the peak and width of each time cell. Specifically, we fit the spiking activity of each neuron with a gaussian to model the peak time () and the standard deviation () of each cell's “time field.”^{11} This fit was repeated for each of the 250 simulated spiking LIF neurons that remained after selecting only those that had at least of their spikes occur within the delay interval. The correlation between and had a Pearson's coefficient of 0.68 (), compared to 0.52 () for the biological time cells. An ordinary linear regression model linking (independent variable) with (dependent variable) resulted in an intercept of (standard error) and a slope of for our simulated data, compared to and , respectively, for the time cell data. We note that we used the same bin size of 1 ms, modeled the same delay length, and did not perform any parameter fitting beyond the informal choices of cutoff, dimensionality (), area of the input signal (1.5), and synaptic time constant ( s).

Neural mechanisms previously proposed to account for time cell responses have either been speculative (Tiganj et al., 2016) or rely on gradually changing firing rates from a bank of arbitrarily long, ideally spaced low-pass filters (Shankar & Howard, 2012; Howard et al., 2014; Tiganj et al., 2015; Tiganj, Shankar, & Howard, 2017). It is unclear if such methods can be implemented accurately and scalably using heterogeneous spiking neurons. We suspect that robust implementation is unlikely given the high precision typically relied on in these abstract models.

In contrast, our proposed spiking model has its network-level dynamics derived from first principles to optimally retain information throughout the delay interval, without relying on a particular synapse model or bank of filters. All of the neurons recurrently work together in a low-dimensional vector space to make efficient use of neural resources. By using the methods of the NEF, this solution is inherently robust to spiking noise and other sources of uncertainty. Furthermore, our explanation accounts for the nonlinear distribution of peak firing times as well as its linear correlation with the spread of time fields.

The observation of time cells across many cortical and subcortical areas suggests that the same neural mechanisms may be used in many circuits throughout the brain. As a result, the neural activity implicated in a variety of delay tasks may be the result of many networks optimizing a similar problem to that of delaying low-frequency signals recurrently along a low-dimensional manifold. Such networks would thus be participating in the temporal coding of a stimulus by representing its history across a delay interval.

## 6 Conclusion

We have discussed two main theoretical results. The first provides a method for accurately implementing continuous-time delays in recurrent spiking neural networks. This begins with a model description of the delay system and ends with a finite-dimensional representation of the input's history that is mapped onto the dynamics of the synapse. The second provides a method for harnessing a broad class of synapse models in spiking neural networks while improving the accuracy of such networks compared to standard NEF implementations. These extensions are validated in the context of the delay network.

Our extensions to the NEF significantly enhance the framework in two ways. First, it allows those deploying the NEF on neuromorphics to improve the accuracy of their systems given the higher-order dynamics of mixed-analog-digital synapses (Voelker, Benjamin et al., 2017; Voelker & Eliasmith, 2017). Second, it advances our understanding of the effects of additional biological constraints, including finite rise times and pure time delays due to action potential propagation. Not only can these more sophisticated synapse models be accounted for, but they may be harnessed to directly improve the network-level performance of certain systems.

We exploited this extension to show that it can improve the accuracy of discrete-time simulations of continuous neural dynamics. We also demonstrated that it can provide accurate implementations of delay networks with a variety of synapse models, allowing systematic exploration of the relationship between synapse- and network-level dynamics. Finally, we suggested that these methods provide new insights into the observed temporal properties of individual cell activity. Specifically we showed that time cell responses during a delay task are well approximated by a delay network constructed using these methods. This same delay network nonlinearly encodes the history of an input stimulus across the delay interval (i.e., a rolling window) by compressing it into a -dimensional state, with length scaling as , where is the input frequency.

While we have focused our attention on delay networks in particular, our framework applies to any linear time-invariant system. As well, though we have not shown it here, as with the original NEF formulation, these methods also apply to nonlinear systems. As a result, these methods characterize a very broad class of combinations of synapse- and network-level spiking dynamical neural networks.

Many important questions still remain concerning the interactions of principles 1, 2, and 3. While the error in our transformations scale as due to independent spiking, it has been shown that near-instantaneous feedback may be used to collaboratively distribute these spikes and scale the error as (Boerlin, Machens, & Denève, 2013; Thalmeier, Uhlmann, Kappen, & Memmesheimer, 2016). This reduction in error has potentially dramatic consequences for the efficiency and scalability of neuromorphics by reducing total spike traffic (Boahen, 2017). However, it is currently unclear whether this approach can be applied to a more biologically plausible setting (e.g., using neurons with refractory periods) while retaining this linear scaling property. Similarly, we wish to characterize the network-level effects of spike-rate adaptation, especially at higher input frequencies, in order to understand the computations that are most accurately supported by more detailed neuron models. This will likely involve extending our work to account for nonlinear dynamical primitives and subsequently harness their effects (e.g., bifurcations) to improve certain classes of computations.

## Appendix: Supplementary Derivations

### A.1 Normalized State-Space Delay

In either case, and depend on the delay length solely by the scalar factor . As a result, we may control the length of the delay by adjusting the gain on the input and feedback signals. The NEF can be used to build such controlled dynamical systems without introducing multiplicative dendritic interactions or implausible on-the-fly connection weight scaling (Eliasmith & Anderson, 2000). The identification of this control factor is connected to a more general property of the Laplace transform, for all , that we can exploit to modulate the width of any filter on-the-fly (in this case affecting the amount of delay; results not shown).

### A.2 Decoding Separate Delays from the Same Network

Although the delay network has its dynamics optimized for a single delay , we can still accurately decode any delay from the same network. This means that the network is representing a rolling window (i.e., history) of length . This window forms a temporal code of the input stimulus.

To compute these other delays, we optimally approximate with a transfer function of order , such that the denominator (which provides us with the recurrent transformation up to a change of basis) depends only on , while the numerator (which provides us with the output transformation up to a change of basis) depends on some relationship between and .

### A.3 Remarks on Choice of State-Space Model

Although the transfer function is in fact a unique description of the input-output behavior of an LTI system, the state-space model (equation 2.7) is not. In general, one may consider any invertible matrix with the same shape as and observe that the state-space model has the same transfer function as . Thus, the state-space model is unique only up to a change of basis. However, in the NEF, the basis may be “absorbed” into the representation of by using the encoders in principle 1, which in turn results in the decoders from principle 2. In other words, considering an alternative state-space model is equivalent to considering a change of basis for the representation.

In practice, when aiming to accurately represent using few neurons, it is important to balance the relative range of values within each dimension, such that a typical trajectory for stays within the space represented by the distribution of encoders, consistent with the samples of (see equation 2.5), and the dynamic range of each neuron. We balance the range of values by numerically computing the that results in a “balanced realization” of (Laub, Heath, Paige, & Ward, 1987; Perev, 2011). We then set the encoders to be unit length and axis aligned and optimize each dimension independently by using the methods from section 2.2. As mentioned in the main text, we occasionally include encoders that are scattered uniformly along the surface of the hypersphere using quasi–Monte Carlo sampling—specifically, using the inverse transform method applied to the Sobol sequence with a spherical coordinate transform (Fang & Wang, 1994; Knight et al., 2016)—to visualize a distributed representation. Finally, we occasionally scale each dimension by a diagonal transformation with the th diagonal equaling where is obtained by simulating the desired system directly on a randomly sampled input. We also experimented with a diagonal transformation with the th diagonal corresponding to the reciprocal of two times the sum of the Hankel singular values (Glover & Partington, 1987) of the subsystem corresponding to . This has the effect of bounding the absolute value of each dimension above by 1 in the worst case (Khaisongkram & Banjerdpongchai, 2007). We have found that these methods of normalizing state-space models typically improve the robustness of our networks across a wide range of parameters.

It is worth noting that the mappings from section 4—with the exception of equations 4.10 and 4.13—do not alter the representation of . Disregarding these exceptions, the same choice of basis is conveniently carried over to the implemented network. Yet it is also the case that the dynamics of the system mapped by equation 4.13 do not depend on the chosen state-space model. This fact is proven implicitly in the following section by characterizing the dynamics in terms of and alone.

### A.4 Analysis of Poles Resulting from Equation 4.13

Consider the determined by equation 4.13, when the derivatives of the input signal are inaccessible. Let be the dynamics of the implemented system for this particular . Due to the ZOH assumption, in general, and so does not technically map onto (see definition ^{1}). As discussed in section 4.4, the approximation satisfies equation 4.1 only when the input is held constant. To be clear, for but not necessarily for . Thus, it is important to characterize the difference between and for general inputs.

To do so, we can examine the poles of the transfer function , which are defined as the complex roots of . The poles of a system fully define the dynamics of its state (up to a change of basis). For instance, a system is exponentially stable if and only if for all poles . Furthermore, if and only if is a pole, where denotes the eigenvalues of the state-space matrix .^{12} Therefore, we may characterize the poles of in terms of the poles of , in order to understand the behavior of the implemented system.

Interestingly, may be rewritten as , which is the ratio of the arithmetic mean to the geometric mean of , which may in turn be interpreted as a cross-entropy expression (Woodhouse, 2001).

Regardless, behaves like when . For the delay system (see equation 3.5), (in fact, the mean value across all poles achieves equality), and so we need . This predicts that a delay system using the double exponential synapse, without access to the input's derivative, must necessarily implement a delay that is proportionally longer than . Otherwise, the second-order dynamics from the synapse will dominate the system-level dynamics. We note that is longest for the case of the alpha synapse () and shortest for the case of the low-pass synapse ( as ).

### A.5 Relationship to Discretization

^{1}and the well-known problem of discretizing a linear dynamical system. A discrete LTI system (see equation 4.3) is identical to a continuous LTI system (see equation 2.7), with three adjustments: (1) the integrator is replaced by a