High-level languages (Matlab, Python) are popular in neuroscience because they are flexible and accelerate development. However, for simulating spiking neural networks, the cost of interpretation is a bottleneck. We describe a set of algorithms to simulate large spiking neural networks efficiently with high-level languages using vector-based operations. These algorithms constitute the core of Brian, a spiking neural network simulator written in the Python language. Vectorized simulation makes it possible to combine the flexibility of high-level languages with the computational efficiency usually associated with compiled languages.
Computational modeling has become a popular approach to understand the nervous system and link experimental findings at different scales, from ion channels to behavior. Although there are several mature neural network simulators (Brette et al., 2007), which can simulate complex biophysical models, including Neuron (Carnevale & Hines, 2006) and Genesis (Bower & Beeman, 1998), or very large-scale spiking models, for example, NEST (Gewaltig & Diesmann, 2007), many scientists prefer to rely on their own simulation code, often written in Matlab. One reason might be that custom code allows more flexible development and control over the models, while in compiled simulators, using a nonstandard model involves writing an extension, which is not straightforward—for example, in NMODL for Neuron (Hines & Carnevale, 2000) or in C for NEST.
Implementing simulations using handwritten code rather than dedicated simulation software, however, has several disadvantages. First, code written by individual scientists is more likely to contain bugs than simulators that have been intensively worked on and tested. This may lead to difficulties in reproducing the results of such simulations (Djurfeldt, Djurfeldt, & Lansner, 2007; Schutter, 2008; Cannon et al., 2007). In addition, handwritten code can be more difficult to understand, which may slow the rate of scientific progress. Several initiatives are underway to develop standardized markup languages for simulations in computational neuroscience (Goddard et al., 2001; Garny et al., 2008; Morse, 2007) in order to address issues of reproducibility. However, these do not address the issue of flexibility, which has driven many scientists to write their own code. To address this issue, we developed a simulator written in Python (Goodman & Brette, 2009), a high-level interpreted language with dynamic typing. This allows users to define their own custom model in the script and easily interact with the simulation, while benefiting from a variety of libraries for scientific computing and visualization that are freely available. All of the major simulation packages now include Python interfaces (Eppler, Helias, Muller, Diesmann, & Gewaltig, 2008; Hines, Davison, & Muller, 2009), and the PyNN project (Davison et al., 2008) is working toward providing a unified interface to them, but these are only interfaces to the compiled code of the simulators and therefore do not solve the problem of flexibility. In the Brian simulator, the core of simulations is written in Python. This allows users to easily define new models, but the obvious problem with this choice is that programs usually run much more slowly with an interpreted language than with a compiled language. To address this problem, we developed vectori- zation algorithms that minimize the number of interpreted operations, so that the performance approaches that of compiled simulators.
Vectorizing code is the strategy of minimizing the amount of time spent in interpreted code compared to highly optimized array functions, which operate on a large number of values in a single operation. For example, the loop for i in range(0,N): z[i]=x[i]+y[i] can be vectorized as z=x+y, where x, y, and z are vectors of length N. In the original code, the number of interpreted operations is proportional to N. In the vectorized code, there are only two interpreted operations and two vectorized ones. Thus, if N is large, interpretation takes a negligible amount of time compared to compiled vector-based operations. In Python with the Numpy scientific package, this operation above is about 40 times faster when it is vectorized for N=1000. In clock-driven simulations, all neurons are synchronously updated at every tick of the clock. If all models have the same functional form (but possibly different parameter values), then the same operations are executed many times with different values. Thus, these operations can be easily vectorized, and if the network contains many neurons, then the interpretation overhead becomes negligible.
We start by presenting data structures that are appropriate for vectorized algorithms (see section 2). We then show how to build a network in a vectorized way (see section 3), before presenting our vectorized algorithms for neural simulation (see section 4). These include state updates, threshold and reset, spike propagation, delays, and synaptic plasticity (see section 5). Similar data structures and algorithms were already used in previous simulators (Brette et al., 2007), but not in the context of vectorization.
In this paper, we illustrate the structures and algorithms with code written in Python using the Numpy library for vector-based operations, but most algorithms can also be adapted to Matlab or Scilab. The algorithms are implemented in the Brian simulator, which can be downloaded from http://briansimulator.org. It is open source and available for free under the CeCILL license (compatible with the GPL license). Appendix B shows the Python code for a complete example simulation, which is independent of the Brian simulator.
2. Models and Data Structures
2.1. Vectorized Containers.
We use several combinations of some basic data structures designed for easy vectorization. These structures are illustrated in Figure 1.
2.1.1. Dynamic Arrays.
A dynamic array is a standard data structure (used, for example, in most implementations of the C++ STL vector class and for Python's list class) that allows both fast indexing and resizing. It is a linear array with some possibly unused space at the end. When we want to append a new element to the dynamic array, we place the element there if there is unused space; otherwise, we resize the array to make space for the new elements. In detail, it is an array val of length n in which the first elements are used and the remainder are free. When an element is appended to the dynamic array, if m<n, then the new element is simply placed in val[m], and m is increased by 1. If m=n (the array is full), then the array is first resized to an array of length (for some constant ) at cost O(n). Thus, every time the array is resized, the operation costs O(n) and leaves space for elements. Therefore, the average cost per extra element is (this is sometimes referred to as the amortized cost per element).
2.1.2. Circular Arrays.
A circular array x is an array that wraps around at the end and can be rotated at essentially no cost. It is implemented as an array y of size N and a cursor variable indicating the current origin of the array. Indexing x is circular in the sense that x[i+N]=x[i]=y[(i+cursor)%N], where % is the modulus operator, giving the remainder after division. Rotating the array corresponds to simply changing the cursor variable. Circular arrays are convenient for storing data that need to be kept for only a limited amount of time. New data are inserted at the origin, and the array is then rotated. After several such insertions, the data are rotated back to the origin and overwritten by new data. The benefits of this type of array for vectorization are that insertion and indexing are inexpensive, and it does not involve frequent allocations and deallocations of memory (which would cause memory fragmentation and cache misses as data locality would be lost).
2.1.3. Cylindrical Arrays.
A cylindrical array X is a two-dimensional version version of a circular array. It has the property that X[t+M, i] = X[t, i] where i is a noncircular index (typically a neuron index in our case), t is a circular index (typically representing time), and M is the size of the circular dimension. In the same way as for circular arrays, it is implemented as a two-dimensional (2D) array Y with a cursor for the first index, where the cursor points to the origin. Cylindrical arrays are then essentially circular arrays in which an array can be stored instead of a single value. The most important property of cylindrical arrays for vectorization is that multiple elements of data can be stored at different indices in a vectorized way. For example, if we have an array V of values and want to store each element of V in a different position in the circular array, this can be done in a vectorized way as follows. To perform n assignments X[t_k, i_k]=v_k (), we can execute a single vectorized operation: X[T, I]=V, where T=, I= and V= are three vectors. For the underlying array, it means Y[(T+cursor)%M, I]=V.
2.2. The State Matrix.
It follows from the hybrid system formulation that the state of a neuron can be stored in a vector . In general, the dimension of this vector is at least the number of synapses. However, in many models used in the literature, synapses of the same type share the same linear dynamics, so that they can be reduced to a single set of differential equations per synapse type, where all spikes coming from synapses of the same type act on the same variable (Lytton, 1996; Song et al., 2000; Morrison, Mehring, Geisel, Aertsen, & Diesmann, 2005; Brette et al., 2007), as in the previous example. In this case, which we will focus on, the dimension of the state vector does not depend on the number of synapses, only on the number of synapse types, which is assumed to be small (e.g., excitatory and inhibitory). Consider a group of N neurons with the same m state variables and the same dynamics (the same differential equations). Then updating the states of these neurons involves repeating the same operations N times, which can be vectorized. We will then define a state matrix for this group made of p rows of length N, each row defining the values of the same state variable for all N neurons. We choose this matrix orientation (rather than N rows of length p) because the vector describing one state variable for the whole group is stored as a contiguous array in memory, which is more efficient for vector-based operations. It is still possible to have heterogeneous groups in this way if parameter values (e.g., the membrane time constant) are different between neurons. In this case, parameters can be treated as state variables and included in the state matrix.
The state of all neurons in a model is then defined by a set of state matrices, one for each neuron type, where the type is defined by the set of differential equations, threshold conditions, and resets. Each state matrix has a small number of rows (the number of state variables) and a large number of columns (the number of neurons). Optimal efficiency is achieved with the smallest number of state matrices. Thus, neurons should be gathered in groups according to their models rather than functionally (e.g., layers).
2.3. Connectivity Structures.
2.3.1. Structures for Network Construction and Spike Propagation.
When a neuron spikes, the states of all target neurons are modified (possibly after a delay), usually by adding a number to one state variable (the synaptic weight). Thus, the connectivity structure must store, for each neuron, the list of target neurons, synaptic weights, and possibly transmission delays. Other variables could be stored, such as transmission probability, but these would involve the same sort of structures.
Suppose we want to describe the connectivity of two groups of neurons P and Q (possibly identical or overlapping), where neurons from P project to neurons in Q. Because the structure should be appropriate for vectorized operations, we assume that each group is homogeneous (same model, but possibly different parameters), and presynaptic spikes from P all act on the same state variable in all neurons in Q. These groups could belong to larger groups. When a spike is emitted, the program needs to retrieve the list of target neurons and their corresponding synaptic weights. This is most efficient if these lists are stored as contiguous vectors in memory.
The simplest case is when these groups are densely connected. Synaptic weights can then be stored in a (dense) matrix W, where Wi,j is the synaptic weight between neuron i in P and neuron j in Q. Here we are assuming that a given neuron does not make multiple synaptic contacts with the same postsynaptic neuron. If it does, several matrices could be used. Thus, the synaptic weights for all target neurons of a neuron in P are stored as a row of W, which is contiguous in memory. Delays can be stored in a similar matrix. If connectivity is sparse, a dense matrix is not efficient. Instead, a sparse matrix should be used (see Figure 2). Because we need to retrieve rows when spikes are produced, the appropriate format is a list of rows, where each row is defined by two vectors: one defining the list of target neurons (as indices in Q) and one defining the list of synaptic weights for these neurons. When the network is being constructed, these are best implemented as chained lists (list of lists, or LIL format), so that one can easily insert new synapses. But for spike propagation, it is more efficient to implement this structure with fixed arrays, following the compressed sparse row (CSR) format. The standard CSR matrix consists of three arrays: val, col, and rowptr. The nonzero values of the matrix are stored in the array val row by row, that is, starting with all nonzero values in row 0, then in row 1, and so on. This array has length nnz, the number of nonzero elements in the matrix. The array col has the same structure, but contains the column indices of all nonzero values, that is, the indices of the postsynaptic neurons. Finally, the array rowptr contains the position for each row in the arrays val and col, so that row i is stored in val[j] at indices j with . The memory requirement for this structure, assuming values are stored as 64 bit floats and indices are stored as 32 bit integers, is bytes. This compressed format can be converted from the list format just before running the simulation (this takes about the same time as construction for 10 synapses per neuron and much less for denser networks).
As for the neuron groups, there should be as few connection matrices as possible to minimize the number of interpreted operations. Thus, synapses should be gathered according to synapse type (which state variable is modified) and neuron type (pre- and postsynaptic neuron groups) rather than functionally. Connectivity between subgroups of the same type (e.g., layers) should be defined using submatrices (with array views).
2.3.2. Structures for Spike-Timing-Dependent Plasticity.
In models of spike-timing-dependent plasticity (STDP), both presynaptic spikes and postsynaptic spikes act on synaptic variables. In particular, when a postsynaptic spike is produced, the synaptic weights of all presynaptic neurons are modified, which means that a column of the connection matrix must be accessed for both reading and writing. This is straightforward if this matrix is dense, but more difficult if it is sparse and stored as a list of rows.
For fast column access, we augment the standard CSR matrix to form what we call a compressed sparse row and column (CSRC) matrix (see Figure 2). We add three additional arrays index, row, and colptr to the CSR matrix. These play essentially the same roles as val, col, and rowptr, except that they are column oriented rather than row oriented, as in a compressed sparse column (CSC) matrix, and instead of storing values in val, we store the indices of the corresponding values in index. So column j consists of all elements with row indices row[i] for and values val[index[i]]. This allows us to perform vectorized operations on columns in the same way as on rows. However, the memory requirements are now bytes, and column access will be slower than row access because of the indirection through the index array and because the data layout is not contiguous (which will reduce cache efficiency). The memory requirements could be reduced to as low as bytes by using 32 bit floats and 16 bit indices (as long as the number of rows and columns were both less than 216=65, 536).
2.4. Structures for Storing Spike Times.
In order to implement delays and refractoriness, we need a structure to store the indices of neurons that have spiked recently (see Figure 3). In order to minimize memory use, this structure should retain these indices only for a fixed period (the maximum delay or refractory period). The standard structure for this sort of behavior is a circular array (see section 2.1.2), which has been previously used for this purpose in nonvectorized simulations (Morrison et al., 2005; Izhikevich, 2006). Each time step, we insert the indices of all neurons that spiked in the group: x[0:len(spikes)]=spikes, and shift the cursor by len(spikes). After insertion, the most recent spikes are always in x[i] for negative i. To extract the spikes produced at a given time step in the past, we need to store a list of the locations of the spikes stored at these time steps in the underlying array y. This is also implemented in a circular array of length duration, where duration is the number of time steps we want to remember spikes for. With this array, we can insert new spikes and extract the set of indices of neurons that spiked at any given time (within duration) at a cost of only O(n), where n is the number of spikes returned. The algorithm is vectorized over spikes. The length of the circular array x depends a priori on the number of spikes produced, which is unknown before the simulation. Thus, we use a dynamic array (see section 2.1.1) to resize the spike container if necessary. As in the case of the standard dynamic array, this makes insertion an amortized O(1) operation.
The Python code for this structure is shown in appendix A.
With sparse matrices, both the synaptic weights and the list of target neurons must be initialized. A typical case is that of random connectivity, where any given pair of neurons has a fixed probability x of being connected and synaptic weights are constant. Again, the connection matrix can be initiali- zed row by row as follows. Suppose the presynaptic group has N neurons and the postsynaptic group has M neurons. For each row of the connection matrix (i.e., each presynaptic neuron), pick the number k of target neurons at random according to a binomial distribution with parameters N and x. Then pick k indices at random between 0 and M−1. These indices are the list of target neurons, and the list of synaptic weights is a constant vector of length k. With this algorithm, constructing each row involves a constant number of vector-based operations. In Python, the algorithm reads:[w]*k is a list of k elements [w, w ,..., w]. Clearly, synaptic weights need not be constant, and their construction can also be vectorized as previously shown.
Simulating a network for one time step consists of (1) updating the state variables, (2) checking threshold conditions and resetting neurons, and (3) propagating spikes. This process is illustrated in Figure 4. We now describe vectorization algorithms for each of these steps, leaving synaptic plasticity to section 5.
4.1. State Updates.
Between spikes, neuron states evolve continuously according to a set of differential equations. Updating the states consists of calculating the value of the state matrix at the next time step S(t+dt), given the current value S(t). Vectorizing these operations is straightforward, as we show in the next paragraphs. Figure 5 shows that with these vectorized algorithms, the interpretation time indeed vanishes for large neuron groups.
4.1.1. Linear Equations.
A special case is when the differential system is linear. In this case, the state vector of a neuron at time t+dt depends linearly on its state vector at time t. Therefore, the following equality holds: X(t+dt)=AX(t)+B, where A and B are determined by the differential system and can be calculated at the start of the simulation (Hirsch & Smale, 1974; Rotter & Diesmann, 1999; Morrison, Straube, Plesser, & Diesmann, 2007).
Each state vector is a column of the state matrix S. It follows that the update operation for all neurons in the group can be compactly written as a matrix product: S(t+dt)=AS(t)+BJT, where J is a vector of ones (with N elements).
4.1.2. Nonlinear Equations.
When equations are not linear, numerical integration schemes such as Euler or Runge-Kutta must be used. Although these updates cannot be written as a matrix product, vectorization is straightforward: it requires us only to substitute state vectors (rows of matrix S) to state variables in the update scheme. For example, the Euler update x(t+dt)=x(t)+f(x(t))dt is replaced by X(t+dt)=X(t)+f(X(t))dt, where the function f now acts element-wise on the vector X.
4.1.3. Stochastic Equations.
Stochastic differential equations can be vectorized similarly by substituting vectors of random values to random values in the stochastic update scheme. For example, a stochastic Euler update would read: , where N(t) is a vector of normally distributed random numbers. There is no specific difficulty in vectorizing stochastic integration methods.
4.2. Threshold, Reset, and Refractoriness.
Spikes are produced when a threshold condition is met, defined on the state variables: . A typical condition would be for an integrate-and-fire model (where Vm=X0), or for a model with adaptive threshold (where X1 is the threshold variable) (Deneve, 2008; Platkiewicz & Brette, 2010). Vectorizing this operation is straightforward: (1) apply a Boolean vector operation on the corresponding rows of the state matrix, for example, S[0, :]>theta or S[0, :]>S[1, :]; (2) transform the Boolean vector into an array of indices of true elements, which is also a generic vector-based operation. This would be typically written as spikes=find(S[0, :]>theta), which outputs the list of all neurons that must spike in this time step.
In the same way as the threshold operation, the reset consists of a number of operations on the state variables () for all neurons that spiked in the time step. This implies (1) selecting a submatrix of the state matrix where columns are selected according to the spikes vector (indices of spiking neurons) and applying the reset operations on rows of this submatrix. For example, the vectorized code for a standard integrate-and-fire model would read: S[0, spikes]=Vr, where Vr is the reset value. For an adaptive threshold model, the threshold would also increase by a fixed amount: S[1, spikes]+=a.
Absolute refractoriness in spiking neuron models generally consists of clamping the membrane potential at a reset value Vr for a predefined duration. To implement this mechanism, we need to extract the indices of all neurons that have spiked in the last k time steps. These indices are stored contiguously in the circular array defined in section 2.4. To extract this subvector refneurons, we need only to obtain the starting index in the circular array where the spikes produced in the kth previous time step were stored. Resetting then amounts to executing the following vectorized code: S[0, refneurons]=Vr.
This implicitly assumes that all neurons have the same refractory period. For heterogeneous refractoriness, we can store an array refractime of length the number of neurons, which gives the next time at which the neuron will stop being refractory. If the refractory periods are given as an array refrac and at time t neurons with indices in the array spikes fire, we update refractime[spikes]=t+refrac[spikes]. At any time, the indices of the refractory neurons are given by the Boolean array refneurons=refractime>t. Heterogeneous refractory times are more computationally expensive than homogeneous ones because the latter operation requires N comparisons (for N the number of neurons) rather than just picking out directly the indices of the neurons that have spiked (on average, indices, where F is the average firing rate). A mixed solution that is often very efficient in practice (although no better in the worst case) is to test the condition refneurons=refractime>t only for neurons that fired within the most recent period max(refrac) (which may include repeated indices). If the array of these indices is called candidates, the code reads: refneurons=candidates[refractime[candidates]>t]. The mixed solution can be chosen when the size of the array candidates is smaller than the number of neurons (which is not always true if neurons can fire several times within the maximum refractory period).
4.3. Propagation of Spikes.
Here we consider the propagation of spikes with no transmission delays, which are treated in the next section. Consider synaptic connections between two groups of neurons where presynaptic spikes act on postsynaptic state variable k. In a nonvectorized way, the spike propagation algorithm can be written as follows:spikes is the list of indices of neurons that spike in the current time step (postsynaptic is a list of postsynaptic neurons with no repetition). The inner loop can be vectorized in the following way: S[k, :]+=W[i, :]. For sparse matrices, this would be implemented as S[k, row_target[i]]+=row_val[i], where row_target[i] is the array of indices of neurons that are postsynaptic to neuron i and row_val[i] is the array of nonzero values in row i of the weight matrix.
Figure 6 shows that, as expected, the interpretation in this algorithm is negligible when the number of synapses per neuron is large.
4.4. Transmission Delays.
4.4.1. Homogeneous Delays.
Implementing spike propagation with homogeneous delays between two neuron groups is straightforward using the structure defined in section 2.4. The same algorithms as described above can be implemented, with the only difference that vectorized state modifications are iterated over spikes in a previous time step (extracted from the spike container) rather than in the current time step.
4.4.2. Heterogeneous Delays.
Heterogeneous delays cannot be simulated in the same way. Morrison et al. (2005) introduced the idea of using a queue for state modifications on the postsynaptic side, implemented as a circular array (see section 2.1.2). For an entire neuron group, this can be vectorized by combining these queues to form a cylindrical array E[t, i] where i is a neuron index and t is a time index. Figure 7 illustrates how this cylindrical array can be used for propagating spikes with delays. At each time step, the row of the cylindrical array corresponding to the current time step is added to the state variables of the target neurons: S[k, :]+=E[0, :]. Then the values in the cylindrical array at the current time step are reset to 0 (E[0, :] = 0) so that they can be reused, and the cursor variable is incremented. Emitted spikes are inserted at positions in the cylindrical array that correspond to the transmission delays; that is, if the connection from neuron i to j has transmission delay d[i, j], then the synaptic weight is added to the cylindrical array at position E[d[i, j], j]. This addition can be vectorized over the postsynaptic neurons j as described in section 2.1.3.
5. Synaptic Plasticity
5.1. Short-Term Plasticity.
In models of short-term synaptic plasticity, synaptic variables (e.g., available synaptic resources) depend on presynaptic activity but not postsynaptic activity. It follows that if the parameter values are the same for all synapses, then it is enough to simulate the plasticity model for each neuron in the presynaptic group rather than for each synapse (see Figure 8).
The differential equations are equivalent to the state update of a neuron group with variables x and u, while the discrete events are equivalent to a reset. Therefore, the synaptic model can be simulated as a neuron group, using the vectorized algorithms that we previously described. The only change is that spike propagation must be modified so that synaptic weights are multiplicatively modulated by ux. This simply amounts to replacing the operation S[k,:]+=W[i,:] by S[k,:]+=W[i,:]*u[i]*x[i]. State updates could be further accelerated with event-driven updates, since their values are required only at spike times, as described in Markram et al. (1998) and implemented in, for example, NEST (Morrison et al., 2008).
However, if the parameter values of the synaptic model (, , U) are heterogeneous or if spike transmission is probabilistic and the updates of xu occur only at successful transmissions, then the values of the synaptic variables are different even when they share the same presynaptic neuron. In this case, we must store and update as many variables as synapses. Updates could still be vectorized (e.g., over synapses), but the simulation cost is much higher.
5.2. Spike-Timing-Dependent Plasticity.
In the synaptic plasticity model described above, the value of the synaptic variable Apre (resp. Apost) depends on only presynaptic (resp. postsynaptic) activity. In this case, we say that the synaptic rule is separable, that is, all synaptic variables can be gathered into two independent pools of presynaptic and postsynaptic variables. Assuming that the synaptic rule is the same for all synapses (same parameter values), this implies that the value of Apre (resp. Apost) is identical for all synapses that share the same presynaptic (resp. postsynaptic) neuron. Therefore, in the same way as for short-term plasticity, we need only to simulate as many presynaptic (resp. postsynaptic) variables Apre (resp. Apost) as presynaptic (resp. postsynaptic) neurons. Simulating the differential equations that define the synaptic rule can thus be done in exactly the same way as for simulating state updates of neuron groups, and the operations on synaptic variables () can be simulated as resets. However, the modification of synaptic weights requires specific vectorized operations.
When a presynaptic spike is produced by neuron i, the synaptic weights for all postsynaptic neurons j are modified by an amount Apost[j] (remember that there is only one postsynaptic variable per neuron in the postsynaptic group). This means that the entire row i of the weight matrix is modified. If the matrix is dense, the vectorized code would simply read W[i, :]+=A_post, where A_post is a vector. Similarly, when a postsynaptic spike is produced, a column of the weight matrix is modified: W[:, i]+=A_pre. This is not entirely vectorized since one needs to loop over pre- and postsynaptic spikes, but the interpretation overhead vanishes for a large number of synapses per neuron. If the matrix is sparse, the implementation is slightly different since only the nonzero elements must be modified. If row_target[i] is the array of indices of neurons that are postsynaptic to neuron i and row_val[i] is the array of nonzero values in row i, then the code should read: row_val[i]+=A_post[row_target[i]]. The symetrical operation must be done for postsynaptic spikes, which implies that the weight matrix can be accessed by column (see section 2.3.2).
5.2.1. STDP with Delays.
When transmission delays are considered, the timing of pre- and postsynaptic spikes in the STDP rule should be defined from the synapse's point of view. If the presynaptic neuron spikes at time tpre, then the spike timing at the synapse is tpre+da, where da is the axonal propagation delay. If the postsynaptic neuron spikes at time tpost, then the spike timing at the synapse is tpost+dd, where dd is the dendritic backpropagation delay. In principle, both delays should be specified in addition to the total propagation delay. The latter may not be the sum da+dd because dendritic backpropagation and forward propagation are not necessarily identical. Various possibilities are discussed in Morrison et al. (2008). Presumably if dendritic backpropagation of action potentials is active (e.g., mediated by sodium channels) while the propagation of postsynaptic potentials is passive, then the former delay should be shorter than the latter one. For this reason, we choose to neglect dendritic backpropagation delays, which amounts to considering that transmission delays are purely axonal. An alternative (nonvectorized) solution requiring that dendritic delays are longer than axonal delays is given in the appendix to Morrison, Aertsen, and Diesmann (2007).
Figure 10 shows an example where N presynaptic neurons make excitatory synaptic connections to a single postsynaptic neuron. The delay from presynaptic neuron i to the postsynaptic neuron is . Now suppose each presynaptic neuron fires simultaneously at time 0 and the postsynaptic neuron fires at time (N/2)d. At the synapses, presynaptic spikes arrive at times , while the postsynaptic spike occurs at time (N/2)d. In this case, synapse i would experience a weight modification of for . More generally, for each pair tpre and tpost of pre- and postsynaptic firing times, with a delay d, the weight modification is for .
We have presented a set of vectorized algorithms for simulating spiking neural networks. These algorithms cover the whole range of operations required to simulate a complete network model, including delays and synaptic plasticity. Neuron models can be expressed as sets of differential equations, describing the evolution of state variables (either neural or synaptic), and discrete events, corresponding to actions performed at spike times. The integration of differential equations can be vectorized over neurons, while discrete events can be vectorized over presynaptic or postsynaptic neurons (i.e., targets of the event under consideration). Thus, if N is the number of neurons and p is the average number of synapses per neuron, then the interpretation overhead vanishes for differential equations (state updates) when N is large (see Figure 5) and for discrete events (spike propagation, plasticity) when p is large (see Figure 6). We previously showed that the Brian simulator, which implements these algorithms, is only about twice as slow as optimized custom C code for large networks (Goodman & Brette, 2008). In Figure 11, we compare our vectorized implementation of a random network (see appendix B) with a nonvectorized implementation, also written in Python (Figure 12 shows the output of the simulation). Vectorized code is faster for more than three neurons and is several hundred times faster for a large number of neurons. In principle, the algorithms could be further optimized by also vectorizing discrete events over emitted spikes, but this would probably require some specific vector-based operations (in fact, matrix-based operations).
An interesting extension would be to port these algorithms to graphics processing units (GPUs), which are parallel coprocessors available on graphics cards. These are inexpensive units designed originally and primarily for computer games that are increasingly being used for nongraphical parallel computing (Owens et al., 2007). The chips contain multiple processor cores (512 in the current state of the art designs) and parallel programming follows the SIMD model (single instruction, multiple data). This model is particularly well adapted to vectorized algorithms. Thus, we expect that our algorithms could be ported to these chips with minimal changes, which could yield considerable improvements in simulation speed for considerably less human and financial investment than clusters. We have already implemented the state update part on GPU and applied it to a model-fitting problem, with speed improvements of 50 to 80 on the GPU compared to single CPU simulations (Rossant, Goodman, Platkiewicz, & Brette, 2010). The next stage, which is more challenging, is to simulate discrete events on GPU.
Another challenge is to incorporate integration methods that use continuous spike timings (i.e., not bound to the time grid), which several authors have recently proposed (D'Haene & Schrauwen, 2010; Hanuschkin, Kunkel, Helias, Morrison, & Diesmann, 2010). The first required addition is that spikes must be communicated along with their precise timing. This could be simply done in a vectorized framework by transmitting a vector of spike times. Dealing with these spike timings in a vectorized way seems more challenging. However, in these methods, some of the more complicated operations occur only when spikes are produced (e.g., spike time interpolation) and thus may not need to be vectorized.
Finally, we addressed only the simulation of networks of single-compartment neuron models. The simulation of a multicompartmental models could be vectorized over compartments, in the same way as the algorithms we presented are vectorized over neurons. The main difference would be to design a vectorized algorithm for simulation of the cable equation on the dendritic tree. Because this amounts to solving a linear problem for a particular matrix structure defining the neuron morphology (Hines, 1984), it seems like an achievable goal.
Appendix A: Spike Container
Appendix B: Example Simulation
The following Python code illustrates many aspects discussed in this article by implementing the simulation of a complete network of spiking neurons. The model corresponds to the CUBA benchmark described in Brette et al. (2007). It is a network of integrate-and-fire neurons with exponential excitatory and inhibitory currents and sparse random connectivity. It also includes homogeneous synaptic delays and refractoriness. The program simulates the network for 400 ms using vectorized algorithms and displays the raster plot and a sample voltage trace:
We thank all those who tested early versions of Brian and made suggestions for improving it. This work was supported by the European Research Council (ERC StG 240132).