## Abstract

High-level languages (Matlab, Python) are popular in neuroscience because they are flexible and accelerate development. However, for simulating spiking neural networks, the cost of interpretation is a bottleneck. We describe a set of algorithms to simulate large spiking neural networks efficiently with high-level languages using vector-based operations. These algorithms constitute the core of Brian, a spiking neural network simulator written in the Python language. Vectorized simulation makes it possible to combine the flexibility of high-level languages with the computational efficiency usually associated with compiled languages.

## 1. Introduction

Computational modeling has become a popular approach to understand the nervous system and link experimental findings at different scales, from ion channels to behavior. Although there are several mature neural network simulators (Brette et al., 2007), which can simulate complex biophysical models, including Neuron (Carnevale & Hines, 2006) and Genesis (Bower & Beeman, 1998), or very large-scale spiking models, for example, NEST (Gewaltig & Diesmann, 2007), many scientists prefer to rely on their own simulation code, often written in Matlab. One reason might be that custom code allows more flexible development and control over the models, while in compiled simulators, using a nonstandard model involves writing an extension, which is not straightforward—for example, in NMODL for Neuron (Hines & Carnevale, 2000) or in C for NEST.

Implementing simulations using handwritten code rather than dedicated simulation software, however, has several disadvantages. First, code written by individual scientists is more likely to contain bugs than simulators that have been intensively worked on and tested. This may lead to difficulties in reproducing the results of such simulations (Djurfeldt, Djurfeldt, & Lansner, 2007; Schutter, 2008; Cannon et al., 2007). In addition, handwritten code can be more difficult to understand, which may slow the rate of scientific progress. Several initiatives are underway to develop standardized markup languages for simulations in computational neuroscience (Goddard et al., 2001; Garny et al., 2008; Morse, 2007) in order to address issues of reproducibility. However, these do not address the issue of flexibility, which has driven many scientists to write their own code. To address this issue, we developed a simulator written in Python (Goodman & Brette, 2009), a high-level interpreted language with dynamic typing. This allows users to define their own custom model in the script and easily interact with the simulation, while benefiting from a variety of libraries for scientific computing and visualization that are freely available. All of the major simulation packages now include Python interfaces (Eppler, Helias, Muller, Diesmann, & Gewaltig, 2008; Hines, Davison, & Muller, 2009), and the PyNN project (Davison et al., 2008) is working toward providing a unified interface to them, but these are only interfaces to the compiled code of the simulators and therefore do not solve the problem of flexibility. In the Brian simulator, the core of simulations is written in Python. This allows users to easily define new models, but the obvious problem with this choice is that programs usually run much more slowly with an interpreted language than with a compiled language. To address this problem, we developed vectori- zation algorithms that minimize the number of interpreted operations, so that the performance approaches that of compiled simulators.

Vectorizing code is the strategy of minimizing the amount of time spent in
interpreted code compared to highly optimized array functions, which operate on a
large number of values in a single operation. For example, the loop for i
in range(0,N): z[i]=x[i]+y[i] can be vectorized as z=x+y, where *x*, *y*, and *z* are vectors of length *N*. In the original code, the number of interpreted operations
is proportional to *N*. In the vectorized code, there are only two
interpreted operations and two vectorized ones. Thus, if *N* is
large, interpretation takes a negligible amount of time compared to compiled
vector-based operations. In Python with the Numpy scientific package, this operation
above is about 40 times faster when it is vectorized for *N*=1000. In clock-driven simulations, all neurons are
synchronously updated at every tick of the clock. If all models have the same
functional form (but possibly different parameter values), then the same operations
are executed many times with different values. Thus, these operations can be easily
vectorized, and if the network contains many neurons, then the interpretation
overhead becomes negligible.

*c*is the cost of one update and

_{U}*c*is the cost of one spike propagation,

_{P}*N*is the number of neurons,

*p*is the number of synapses per neuron,

*F*is the average firing rate, and d

*t*is the time step (the cost is for 1 second of biological time). The principle of vectorization is to balance each interpreted operation with a vector-based operation that acts on many values, so that the proportion of time spent in interpretation vanishes as the number of neurons or synapses increases. In general, this amounts to replacing all loops by vector-based operations, but this is not a systematic rule: a loop can be kept if each operation inside the loop is vectorized. For example, if update is vectorized over neurons and propagation is vectorized over synapses (i.e., the action of one outgoing spike on all target neurons is executed in one vector-based operation), then the simulation cost is where

*c*is the interpretation cost. If both

_{I}*N*and

*p*are large, interpretation does not degrade simulation speed.

We start by presenting data structures that are appropriate for vectorized algorithms (see section 2). We then show how to build a network in a vectorized way (see section 3), before presenting our vectorized algorithms for neural simulation (see section 4). These include state updates, threshold and reset, spike propagation, delays, and synaptic plasticity (see section 5). Similar data structures and algorithms were already used in previous simulators (Brette et al., 2007), but not in the context of vectorization.

In this paper, we illustrate the structures and algorithms with code written in Python using the Numpy library for vector-based operations, but most algorithms can also be adapted to Matlab or Scilab. The algorithms are implemented in the Brian simulator, which can be downloaded from http://briansimulator.org. It is open source and available for free under the CeCILL license (compatible with the GPL license). Appendix B shows the Python code for a complete example simulation, which is independent of the Brian simulator.

## 2. Models and Data Structures

### 2.1. Vectorized Containers.

We use several combinations of some basic data structures designed for easy vectorization. These structures are illustrated in Figure 1.

#### 2.1.1. Dynamic Arrays.

A dynamic array is a standard data structure (used, for example, in most
implementations of the C++ STL vector class and for
Python's list class) that allows both fast indexing
and resizing. It is a linear array with some possibly unused space at the
end. When we want to append a new element to the dynamic array, we place the
element there if there is unused space; otherwise, we resize the array to
make space for the new elements. In detail, it is an array val of length *n* in which the
first elements are used and the remainder are free. When an
element is appended to the dynamic array, if *m*<*n*, then the new element is simply
placed in val[m], and *m* is increased
by 1. If *m*=*n* (the array is full),
then the array is first resized to an array of length (for some constant ) at cost *O*(*n*). Thus,
every time the array is resized, the operation costs *O*(*n*) and leaves space for elements. Therefore, the average cost per extra element is (this is sometimes referred to as the amortized cost per
element).

#### 2.1.2. Circular Arrays.

A circular array x is an array that wraps around at the end and can be rotated at essentially no cost. It is implemented as an array y of size N and a cursor variable indicating the current origin of the array. Indexing x is circular in the sense that x[i+N]=x[i]=y[(i+cursor)%N], where % is the modulus operator, giving the remainder after division. Rotating the array corresponds to simply changing the cursor variable. Circular arrays are convenient for storing data that need to be kept for only a limited amount of time. New data are inserted at the origin, and the array is then rotated. After several such insertions, the data are rotated back to the origin and overwritten by new data. The benefits of this type of array for vectorization are that insertion and indexing are inexpensive, and it does not involve frequent allocations and deallocations of memory (which would cause memory fragmentation and cache misses as data locality would be lost).

#### 2.1.3. Cylindrical Arrays.

A cylindrical array X is a two-dimensional version
version of a circular array. It has the property that X[t+M, i]
= X[t, i] where i is a
noncircular index (typically a neuron index in our case), t is a circular index (typically representing
time), and M is the size of the circular dimension.
In the same way as for circular arrays, it is implemented as a
two-dimensional (2D) array Y with a cursor for the first index, where the cursor points to the origin. Cylindrical arrays
are then essentially circular arrays in which an array can be stored instead
of a single value. The most important property of cylindrical arrays for
vectorization is that multiple elements of data can be stored at different
indices in a vectorized way. For example, if we have an array V of values and want to store each element of V in a different position in the circular array,
this can be done in a vectorized way as follows. To perform *n* assignments X[t_k,
i_k]=v_k (), we can execute a single vectorized operation: X[T, I]=V, where T=, I= and V= are three vectors. For the underlying array, it means Y[(T+cursor)%M, I]=V.

### 2.2. The State Matrix.

*V*=

_{m}*X*

_{0}, the first coordinate of ). Spikes are propagated to target neurons, possibly with a transmission delay, where they modify the value of a state variable (). For integrate-and-fire models, the membrane potential is reset when a spike is produced: .

*V*(

*t*) is the membrane potential,

*V*

_{rest}is the rest potential,

*w*is the synaptic weight of synapse

_{i}*i*, and

*t*are the timings of the spikes coming from synapse

_{i}*i*. This model can be rewritten as a hybrid system by expressing the PSP as the solution of a linear differential system. For example, consider an -function: . This PSP is the integrated formulation of the associated kinetic model , , where presynaptic spikes act on

*y*(i.e., cause an instantaneous increase in

*y*), so that the model can be reformulated as follows: Virtually all PSPs or postsynaptic currents described in the literature (e.g., bi-exponential functions) can be expressed this way. Several authors have described the transformation from phenomenological expressions to the hybrid system formalism for synaptic conductances and currents (Destexhe, Mainen, & Sejnowski, 1994a, 1994b; Rotter & Diesmann, 1999; Giugliano, 2000), short-term synaptic depression (Tsodyks & Markram, 1997; Giugliano, Bove, & Grattarola, 1999), and spike-timing-dependent plasticity (Song, Miller, & Abbott, 2000) (see also section 5). A systematic method to derive the equivalent differential system for a PSP (or postsynaptic current or conductance) is to use the Laplace transform or the Z-transform (Köhn & Wörgötter, 1998; see also Sanchez-Montanez, 2001), by seeing the PSP as the impulse response of a linear time-invariant system (which can be seen as a filter; Jahnke, Roth, & Schnauer, 1999).

It follows from the hybrid system formulation that the state of a neuron can be
stored in a vector . In general, the dimension of this vector is at least the
number of synapses. However, in many models used in the literature, synapses of
the same type share the same linear dynamics, so that they can be reduced to a
single set of differential equations per synapse type, where all spikes coming
from synapses of the same type act on the same variable (Lytton, 1996; Song et al., 2000; Morrison, Mehring, Geisel, Aertsen, & Diesmann, 2005; Brette et al., 2007), as in the previous example. In this case,
which we will focus on, the dimension of the state vector does not depend on the
number of synapses, only on the number of synapse types, which is assumed to be
small (e.g., excitatory and inhibitory). Consider a group of *N* neurons with the same *m* state variables and the same dynamics
(the same differential equations). Then updating the states of these neurons
involves repeating the same operations *N* times, which can be
vectorized. We will then define a state matrix for this group made of *p* rows of length *N*, each row defining the values of the same state variable
for all *N* neurons. We choose this matrix orientation (rather
than *N* rows of length *p*) because the vector
describing one state variable for the whole group is stored as a contiguous
array in memory, which is more efficient for vector-based operations. It is
still possible to have heterogeneous groups in this way if parameter values
(e.g., the membrane time constant) are different between neurons. In this case,
parameters can be treated as state variables and included in the state
matrix.

The state of all neurons in a model is then defined by a set of state matrices, one for each neuron type, where the type is defined by the set of differential equations, threshold conditions, and resets. Each state matrix has a small number of rows (the number of state variables) and a large number of columns (the number of neurons). Optimal efficiency is achieved with the smallest number of state matrices. Thus, neurons should be gathered in groups according to their models rather than functionally (e.g., layers).

### 2.3. Connectivity Structures.

#### 2.3.1. Structures for Network Construction and Spike Propagation.

When a neuron spikes, the states of all target neurons are modified (possibly after a delay), usually by adding a number to one state variable (the synaptic weight). Thus, the connectivity structure must store, for each neuron, the list of target neurons, synaptic weights, and possibly transmission delays. Other variables could be stored, such as transmission probability, but these would involve the same sort of structures.

Suppose we want to describe the connectivity of two groups of neurons P and Q (possibly identical or overlapping), where neurons from P project to neurons in Q. Because the structure should be appropriate for vectorized operations, we assume that each group is homogeneous (same model, but possibly different parameters), and presynaptic spikes from P all act on the same state variable in all neurons in Q. These groups could belong to larger groups. When a spike is emitted, the program needs to retrieve the list of target neurons and their corresponding synaptic weights. This is most efficient if these lists are stored as contiguous vectors in memory.

The simplest case is when these groups are densely connected. Synaptic
weights can then be stored in a (dense) matrix **W**, where *W*_{i,j} is the synaptic weight between neuron *i* in P and
neuron *j* in Q. Here we are assuming that a given neuron
does not make multiple synaptic contacts with the same postsynaptic neuron.
If it does, several matrices could be used. Thus, the synaptic weights for
all target neurons of a neuron in P are stored as a row of **W**,
which is contiguous in memory. Delays can be stored in a similar matrix. If
connectivity is sparse, a dense matrix is not efficient. Instead, a sparse
matrix should be used (see Figure 2).
Because we need to retrieve rows when spikes are produced, the appropriate
format is a list of rows, where each row is defined by two vectors: one
defining the list of target neurons (as indices in Q) and one defining the
list of synaptic weights for these neurons. When the network is being
constructed, these are best implemented as chained lists (list of lists, or
LIL format), so that one can easily insert new synapses. But for spike
propagation, it is more efficient to implement this structure with fixed
arrays, following the compressed sparse row (CSR) format. The standard CSR
matrix consists of three arrays: val, col, and rowptr. The
nonzero values of the matrix are stored in the array val row by row, that is, starting with all
nonzero values in row 0, then in row 1, and so on. This array has length nnz, the number of nonzero elements in the
matrix. The array col has the same structure, but
contains the column indices of all nonzero values, that is, the indices of
the postsynaptic neurons. Finally, the array rowptr contains the position for each row in the arrays val and col, so that row *i* is stored in val[j] at indices *j* with . The memory requirement for this structure, assuming
values are stored as 64 bit floats and indices are stored as 32 bit
integers, is bytes. This compressed format can be converted from the
list format just before running the simulation (this takes about the same
time as construction for 10 synapses per neuron and much less for denser
networks).

As for the neuron groups, there should be as few connection matrices as possible to minimize the number of interpreted operations. Thus, synapses should be gathered according to synapse type (which state variable is modified) and neuron type (pre- and postsynaptic neuron groups) rather than functionally. Connectivity between subgroups of the same type (e.g., layers) should be defined using submatrices (with array views).

#### 2.3.2. Structures for Spike-Timing-Dependent Plasticity.

In models of spike-timing-dependent plasticity (STDP), both presynaptic spikes and postsynaptic spikes act on synaptic variables. In particular, when a postsynaptic spike is produced, the synaptic weights of all presynaptic neurons are modified, which means that a column of the connection matrix must be accessed for both reading and writing. This is straightforward if this matrix is dense, but more difficult if it is sparse and stored as a list of rows.

For fast column access, we augment the standard CSR matrix to form what we
call a compressed sparse row and column (CSRC) matrix (see Figure 2). We add three additional arrays index, row, and colptr to the CSR matrix. These play essentially
the same roles as val, col,
and rowptr, except that they are column oriented
rather than row oriented, as in a compressed sparse column (CSC) matrix, and
instead of storing values in val, we store the
indices of the corresponding values in index. So
column *j* consists of all elements with row indices row[i] for and values val[index[i]]. This
allows us to perform vectorized operations on columns in the same way as on
rows. However, the memory requirements are now bytes, and column access will be slower than row access
because of the indirection through the index array
and because the data layout is not contiguous (which will reduce cache
efficiency). The memory requirements could be reduced to as low as bytes by using 32 bit floats and 16 bit indices (as long
as the number of rows and columns were both less than
2^{16}=65, 536).

### 2.4. Structures for Storing Spike Times.

In order to implement delays and refractoriness, we need a structure to store the
indices of neurons that have spiked recently (see Figure 3). In order to minimize memory use, this structure
should retain these indices only for a fixed period (the maximum delay or
refractory period). The standard structure for this sort of behavior is a
circular array (see section 2.1.2), which has been previously used for this
purpose in nonvectorized simulations (Morrison et al., 2005; Izhikevich, 2006). Each time step, we insert the indices of all neurons that
spiked in the group: x[0:len(spikes)]=spikes, and
shift the cursor by len(spikes). After insertion, the
most recent spikes are always in x[i] for negative i. To extract the spikes produced at a given time
step in the past, we need to store a list of the locations of the spikes stored
at these time steps in the underlying array y. This is
also implemented in a circular array of length duration,
where duration is the number of time steps we want to
remember spikes for. With this array, we can insert new spikes and extract the
set of indices of neurons that spiked at any given time (within duration) at a cost of only *O*(*n*), where *n* is the
number of spikes returned. The algorithm is vectorized over spikes. The length
of the circular array x depends a priori on the number of
spikes produced, which is unknown before the simulation. Thus, we use a dynamic
array (see section 2.1.1) to resize the spike container if necessary. As in the
case of the standard dynamic array, this makes insertion an amortized *O*(1) operation.

The Python code for this structure is shown in appendix A.

## 3. Construction

*N*neurons and constructing connection matrices with

*Np*entries: where

*c*is the cost of initializing one neuron and

_{N}*c*is the cost of initializing one synapse. The cost of the second term can be significant, because it is comparable to the cost of simulating spike propagation for a duration 1/

_{C}*F*, where

*F*is the average firing rate. Vectorizing neuron initialization is straightforward because the values of a variable for all neurons are represented by a row of the state matrix. For example, the following Python instruction initializes the second state variable of all neurons to a random value between −70 and −55: S[1,:]=-70+rand(N)*15. Vectorizing the construction of connection matrices is more complex. A simple option is to vectorize over synapses. As an example, the following Brian instruction connects all pairs of neurons from the same group P with a distance-dependent weight (the topology is a ring): where

*C*is the connection object, which contains the dense connection matrix C.W. When this method is called, the matrix is built row by row by calling the weight function with arguments

*i*and

*j*(in Python, the lambda keyword defines a function), where

*i*is the row index and

*j*is the vector of indices of target neurons. This is equivalent to the following Python code (using the Numpy package for vector-based operations): These instructions now involve only

*O*(

*n*) interpreted operations. More generally, by vectorizing over synapses, the analysis of the computation cost including interpretation reads: where

*c*is the interpretation cost, and the interpretation overhead is negligible when both

_{I}*N*and

*p*are large.

With sparse matrices, both the synaptic weights and the list of target neurons must
be initialized. A typical case is that of random connectivity, where any given pair
of neurons has a fixed probability *x* of being connected and
synaptic weights are constant. Again, the connection matrix can be initiali- zed row
by row as follows. Suppose the presynaptic group has *N* neurons and
the postsynaptic group has *M* neurons. For each row of the
connection matrix (i.e., each presynaptic neuron), pick the number *k* of target neurons at random according to a binomial
distribution with parameters *N* and *x*. Then pick *k* indices at random between 0 and *M*−1.
These indices are the list of target neurons, and the list of synaptic weights is a
constant vector of length *k*. With this algorithm, constructing each
row involves a constant number of vector-based operations. In Python, the algorithm
reads:

*k*elements [w, w ,..., w]. Clearly, synaptic weights need not be constant, and their construction can also be vectorized as previously shown.

## 4. Simulation

Simulating a network for one time step consists of (1) updating the state variables, (2) checking threshold conditions and resetting neurons, and (3) propagating spikes. This process is illustrated in Figure 4. We now describe vectorization algorithms for each of these steps, leaving synaptic plasticity to section 5.

### 4.1. State Updates.

Between spikes, neuron states evolve continuously according to a set of
differential equations. Updating the states consists of calculating the value of
the state matrix at the next time step **S**(*t*+d*t*), given the current
value **S**(*t*). Vectorizing these operations is
straightforward, as we show in the next paragraphs. Figure 5 shows that with these vectorized algorithms, the
interpretation time indeed vanishes for large neuron groups.

#### 4.1.1. Linear Equations.

A special case is when the differential system is linear. In this case, the
state vector of a neuron at time *t*+d*t* depends linearly on its state vector at time *t*. Therefore,
the following equality holds: **X**(*t*+d*t*)=**AX**(*t*)+**B**,
where **A** and **B** are determined by the differential
system and can be calculated at the start of the simulation (Hirsch &
Smale, 1974; Rotter & Diesmann, 1999; Morrison, Straube, Plesser,
& Diesmann, 2007).

Each state vector is a column of the state matrix **S**. It follows
that the update operation for all neurons in the group can be compactly
written as a matrix product: **S**(*t*+d*t*)=**AS**(*t*)+**BJ**^{T}, where **J** is a vector of ones (with *N* elements).

#### 4.1.2. Nonlinear Equations.

When equations are not linear, numerical integration schemes such as Euler or
Runge-Kutta must be used. Although these updates cannot be written as a
matrix product, vectorization is straightforward: it requires us only to
substitute state vectors (rows of matrix **S**) to state variables
in the update scheme. For example, the Euler update *x*(*t*+d*t*)=*x*(*t*)+*f*(*x*(*t*))d*t* is replaced by **X**(*t*+d*t*)=**X**(*t*)+*f*(**X**(*t*))d*t*,
where the function *f* now acts element-wise on the vector **X**.

#### 4.1.3. Stochastic Equations.

Stochastic differential equations can be vectorized similarly by substituting
vectors of random values to random values in the stochastic update scheme.
For example, a stochastic Euler update would read: , where **N**(*t*) is a vector of
normally distributed random numbers. There is no specific difficulty in
vectorizing stochastic integration methods.

### 4.2. Threshold, Reset, and Refractoriness.

#### 4.2.1. Threshold.

Spikes are produced when a threshold condition is met, defined on the state
variables: . A typical condition would be for an integrate-and-fire model (where *V _{m}*=

*X*

_{0}), or for a model with adaptive threshold (where

*X*

_{1}is the threshold variable) (Deneve, 2008; Platkiewicz & Brette, 2010). Vectorizing this operation is straightforward: (1) apply a Boolean vector operation on the corresponding rows of the state matrix, for example, S[0, :]>theta or S[0, :]>S[1, :]; (2) transform the Boolean vector into an array of indices of true elements, which is also a generic vector-based operation. This would be typically written as spikes=find(S[0, :]>theta), which outputs the list of all neurons that must spike in this time step.

#### 4.2.2. Reset.

In the same way as the threshold operation, the reset consists of a number of operations on the state variables () for all neurons that spiked in the time step. This implies (1) selecting a submatrix of the state matrix where columns are selected according to the spikes vector (indices of spiking neurons) and applying the reset operations on rows of this submatrix. For example, the vectorized code for a standard integrate-and-fire model would read: S[0, spikes]=Vr, where Vr is the reset value. For an adaptive threshold model, the threshold would also increase by a fixed amount: S[1, spikes]+=a.

#### 4.2.3. Refractoriness.

Absolute refractoriness in spiking neuron models generally consists of
clamping the membrane potential at a reset value *V _{r}* for a predefined duration. To implement this mechanism, we need to
extract the indices of all neurons that have spiked in the last

*k*time steps. These indices are stored contiguously in the circular array defined in section 2.4. To extract this subvector refneurons, we need only to obtain the starting index in the circular array where the spikes produced in the

*k*th previous time step were stored. Resetting then amounts to executing the following vectorized code: S[0, refneurons]=Vr.

This implicitly assumes that all neurons have the same refractory period. For
heterogeneous refractoriness, we can store an array refractime of length the number of neurons, which
gives the next time at which the neuron will stop being refractory. If the
refractory periods are given as an array refrac and
at time t neurons with indices in the array spikes fire, we update refractime[spikes]=t+refrac[spikes]. At
any time, the indices of the refractory neurons are given by the Boolean
array refneurons=refractime>t.
Heterogeneous refractory times are more computationally expensive than
homogeneous ones because the latter operation requires *N* comparisons (for *N* the number of neurons) rather than just
picking out directly the indices of the neurons that have spiked (on
average, indices, where *F* is the average firing
rate). A mixed solution that is often very efficient in practice (although
no better in the worst case) is to test the condition refneurons=refractime>t only for
neurons that fired within the most recent period max(refrac) (which may include repeated indices).
If the array of these indices is called candidates,
the code reads: refneurons=candidates[refractime[candidates]>t].
The mixed solution can be chosen when the size of the array candidates is smaller than the number of neurons
(which is not always true if neurons can fire several times within the
maximum refractory period).

### 4.3. Propagation of Spikes.

Here we consider the propagation of spikes with no transmission delays, which are
treated in the next section. Consider synaptic connections between two groups of
neurons where presynaptic spikes act on postsynaptic state variable *k*. In a nonvectorized way, the spike propagation algorithm
can be written as follows:

*i*and row_val[i] is the array of nonzero values in row

*i*of the weight matrix.

Figure 6 shows that, as expected, the interpretation in this algorithm is negligible when the number of synapses per neuron is large.

### 4.4. Transmission Delays.

#### 4.4.1. Homogeneous Delays.

Implementing spike propagation with homogeneous delays between two neuron groups is straightforward using the structure defined in section 2.4. The same algorithms as described above can be implemented, with the only difference that vectorized state modifications are iterated over spikes in a previous time step (extracted from the spike container) rather than in the current time step.

#### 4.4.2. Heterogeneous Delays.

Heterogeneous delays cannot be simulated in the same way. Morrison et al.
(2005) introduced the idea of
using a queue for state modifications on the postsynaptic side, implemented
as a circular array (see section 2.1.2). For an entire neuron group, this can be vectorized by
combining these queues to form a cylindrical array E[t,
i] where i is a neuron index and t is a time index. Figure 7 illustrates how this cylindrical array can be
used for propagating spikes with delays. At each time step, the row of the
cylindrical array corresponding to the current time step is added to the
state variables of the target neurons: S[k, :]+=E[0,
:]. Then the values in the cylindrical array at the current
time step are reset to 0 (E[0, :] = 0) so that
they can be reused, and the cursor variable is
incremented. Emitted spikes are inserted at positions in the cylindrical
array that correspond to the transmission delays; that is, if the connection
from neuron *i* to *j* has transmission delay d[i, j], then the synaptic weight is added to the
cylindrical array at position E[d[i, j], j]. This
addition can be vectorized over the postsynaptic neurons *j* as described in section 2.1.3.

## 5. Synaptic Plasticity

### 5.1. Short-Term Plasticity.

In models of short-term synaptic plasticity, synaptic variables (e.g., available synaptic resources) depend on presynaptic activity but not postsynaptic activity. It follows that if the parameter values are the same for all synapses, then it is enough to simulate the plasticity model for each neuron in the presynaptic group rather than for each synapse (see Figure 8).

*x*and

*u*are governed by the following differential equations: where , are time constants and

*U*is a parameter in [0, 1]. Each presynaptic spike triggers modifications of the variables: Synaptic weights are multiplicatively modulated by the product

*ux*(before update). Variables

*x*and

*u*are in principle synapse specific. For example,

*x*corresponds to the fraction of available resources at that synapse. However, if all parameter values are identical for all synapses, then these synaptic variables always have the same value for all synapses that share the same presynaptic neuron, since the same updates are performed. Therefore, it is sufficient to simulate the synaptic model for all presynaptic neurons.

The differential equations are equivalent to the state update of a neuron group
with variables *x* and *u*, while the discrete
events are equivalent to a reset. Therefore, the synaptic model can be simulated
as a neuron group, using the vectorized algorithms that we previously described.
The only change is that spike propagation must be modified so that synaptic
weights are multiplicatively modulated by *ux*. This simply
amounts to replacing the operation S[k,:]+=W[i,:] by S[k,:]+=W[i,:]*u[i]*x[i]. State updates could
be further accelerated with event-driven updates, since their values are
required only at spike times, as described in Markram et al. (1998) and implemented in, for example, NEST (Morrison
et al., 2008).

However, if the parameter values of the synaptic model (, , *U*) are heterogeneous or if spike
transmission is probabilistic and the updates of *x**u* occur only at successful transmissions, then the values of
the synaptic variables are different even when they share the same presynaptic
neuron. In this case, we must store and update as many variables as synapses.
Updates could still be vectorized (e.g., over synapses), but the simulation cost
is much higher.

### 5.2. Spike-Timing-Dependent Plasticity.

*A*

_{pre}and

*A*

_{post}, which are “traces” of pre- and postsynaptic activity and are governed by the following differential equations (see Figure 9): When a presynaptic spike occurs, the presynaptic trace is updated, and the weight is modified according to the postsynaptic trace: When a postsynaptic spike occurs, the symetrical operations are performed: With this plasticity rule, all spike pairs linearly interact because the differential equations for the traces are linear and their updates are additive. If only nearest-neighboring spikes interact, then the update of the trace

*A*

_{pre}should be (and similarly for

*A*

_{post}).

In the synaptic plasticity model described above, the value of the synaptic
variable *A*_{pre} (resp. *A*_{post}) depends on only presynaptic (resp. postsynaptic) activity. In
this case, we say that the synaptic rule is separable, that is, all synaptic
variables can be gathered into two independent pools of presynaptic and
postsynaptic variables. Assuming that the synaptic rule is the same for all
synapses (same parameter values), this implies that the value of *A*_{pre} (resp. *A*_{post}) is identical for all synapses that share the same presynaptic
(resp. postsynaptic) neuron. Therefore, in the same way as for short-term
plasticity, we need only to simulate as many presynaptic (resp. postsynaptic)
variables *A*_{pre} (resp. *A*_{post}) as presynaptic (resp. postsynaptic) neurons. Simulating the
differential equations that define the synaptic rule can thus be done in exactly
the same way as for simulating state updates of neuron groups, and the
operations on synaptic variables () can be simulated as resets. However, the modification of
synaptic weights requires specific vectorized operations.

When a presynaptic spike is produced by neuron *i*, the synaptic
weights for all postsynaptic neurons *j* are modified by an
amount *A*_{post}[*j*] (remember that there is only one
postsynaptic variable per neuron in the postsynaptic group). This means that the
entire row *i* of the weight matrix is modified. If the matrix is
dense, the vectorized code would simply read W[i,
:]+=A_post, where A_post is a
vector. Similarly, when a postsynaptic spike is produced, a column of the weight
matrix is modified: W[:, i]+=A_pre. This is not
entirely vectorized since one needs to loop over pre- and postsynaptic spikes,
but the interpretation overhead vanishes for a large number of synapses per
neuron. If the matrix is sparse, the implementation is slightly different since
only the nonzero elements must be modified. If row_target[i] is the array of indices of neurons that
are postsynaptic to neuron *i* and row_val[i] is the array of nonzero values in row *i*, then the code should read: row_val[i]+=A_post[row_target[i]]. The
symetrical operation must be done for postsynaptic spikes, which implies that
the weight matrix can be accessed by column (see section 2.3.2).

#### 5.2.1. STDP with Delays.

When transmission delays are considered, the timing of pre- and postsynaptic
spikes in the STDP rule should be defined from the synapse's point of view.
If the presynaptic neuron spikes at time *t*_{pre}, then the spike timing at the synapse is *t*_{pre}+*d _{a}*, where

*d*is the axonal propagation delay. If the postsynaptic neuron spikes at time

_{a}*t*

_{post}, then the spike timing at the synapse is

*t*

_{post}+

*d*, where

_{d}*d*is the dendritic backpropagation delay. In principle, both delays should be specified in addition to the total propagation delay. The latter may not be the sum

_{d}*d*+

_{a}*d*because dendritic backpropagation and forward propagation are not necessarily identical. Various possibilities are discussed in Morrison et al. (2008). Presumably if dendritic backpropagation of action potentials is active (e.g., mediated by sodium channels) while the propagation of postsynaptic potentials is passive, then the former delay should be shorter than the latter one. For this reason, we choose to neglect dendritic backpropagation delays, which amounts to considering that transmission delays are purely axonal. An alternative (nonvectorized) solution requiring that dendritic delays are longer than axonal delays is given in the appendix to Morrison, Aertsen, and Diesmann (2007).

_{d}Figure 10 shows an example where *N* presynaptic neurons make excitatory synaptic
connections to a single postsynaptic neuron. The delay from presynaptic
neuron *i* to the postsynaptic neuron is . Now suppose each presynaptic neuron fires simultaneously
at time 0 and the postsynaptic neuron fires at time
(*N*/2)*d*. At the synapses, presynaptic
spikes arrive at times , while the postsynaptic spike occurs at time
(*N*/2)*d*. In this case, synapse *i* would experience a weight modification of for . More generally, for each pair *t*_{pre} and *t*_{post} of pre- and postsynaptic firing times, with a delay *d*, the weight modification is for .

*A*

_{pre}and

*A*

_{post}are unchanged, but the synaptic modifications occur at different times: The postsynaptic update (first line) can be easily vectorized over presynaptic neurons (which have different delays

*d*) if the previous values of

*A*

_{pre}are stored in a cylindrical array (see section 2.1.3). At each time step, the current values of

*A*

_{pre}and

*A*

_{post}are stored in the row of the array pointed to by a cursor variable, which is then incremented. However, the presynaptic update (second line) cannot be directly vectorized over postsynaptic neurons because the weight modifications are not synchronous (each postsynaptic neuron is associated with a specific delay

*d*). We suggest doing these updates at time

*t*

_{pre}+

*D*instead of

*t*

_{pre}+

*d*, where

*D*is the maximum delay: The synaptic modification is unchanged (note that we must use

*A*

_{post}(

*t*

_{pre}+

*d*) and not

*A*

_{post}(

*t*

_{pre}+

*D*)) but slightly postponed. Although it is not mathematically equivalent, it should make an extremely small difference because individual synaptic modifications are very small. In addition, the exact time of these updates in biological synapses is not known because synaptic modifications are tested only after many pairings.

## 6. Discussion

We have presented a set of vectorized algorithms for simulating spiking neural
networks. These algorithms cover the whole range of operations required to simulate
a complete network model, including delays and synaptic plasticity. Neuron models
can be expressed as sets of differential equations, describing the evolution of
state variables (either neural or synaptic), and discrete events, corresponding to
actions performed at spike times. The integration of differential equations can be
vectorized over neurons, while discrete events can be vectorized over presynaptic or
postsynaptic neurons (i.e., targets of the event under consideration). Thus, if *N* is the number of neurons and *p* is the
average number of synapses per neuron, then the interpretation overhead vanishes for
differential equations (state updates) when *N* is large (see Figure 5) and for discrete events (spike
propagation, plasticity) when *p* is large (see Figure 6). We previously showed that the Brian simulator, which
implements these algorithms, is only about twice as slow as optimized custom C code
for large networks (Goodman & Brette, 2008). In Figure 11, we compare
our vectorized implementation of a random network (see appendix B) with a
nonvectorized implementation, also written in Python (Figure 12 shows the output of the simulation). Vectorized code is
faster for more than three neurons and is several hundred times faster for a large
number of neurons. In principle, the algorithms could be further optimized by also
vectorizing discrete events over emitted spikes, but this would probably require
some specific vector-based operations (in fact, matrix-based operations).

An interesting extension would be to port these algorithms to graphics processing units (GPUs), which are parallel coprocessors available on graphics cards. These are inexpensive units designed originally and primarily for computer games that are increasingly being used for nongraphical parallel computing (Owens et al., 2007). The chips contain multiple processor cores (512 in the current state of the art designs) and parallel programming follows the SIMD model (single instruction, multiple data). This model is particularly well adapted to vectorized algorithms. Thus, we expect that our algorithms could be ported to these chips with minimal changes, which could yield considerable improvements in simulation speed for considerably less human and financial investment than clusters. We have already implemented the state update part on GPU and applied it to a model-fitting problem, with speed improvements of 50 to 80 on the GPU compared to single CPU simulations (Rossant, Goodman, Platkiewicz, & Brette, 2010). The next stage, which is more challenging, is to simulate discrete events on GPU.

Another challenge is to incorporate integration methods that use continuous spike timings (i.e., not bound to the time grid), which several authors have recently proposed (D'Haene & Schrauwen, 2010; Hanuschkin, Kunkel, Helias, Morrison, & Diesmann, 2010). The first required addition is that spikes must be communicated along with their precise timing. This could be simply done in a vectorized framework by transmitting a vector of spike times. Dealing with these spike timings in a vectorized way seems more challenging. However, in these methods, some of the more complicated operations occur only when spikes are produced (e.g., spike time interpolation) and thus may not need to be vectorized.

Finally, we addressed only the simulation of networks of single-compartment neuron models. The simulation of a multicompartmental models could be vectorized over compartments, in the same way as the algorithms we presented are vectorized over neurons. The main difference would be to design a vectorized algorithm for simulation of the cable equation on the dendritic tree. Because this amounts to solving a linear problem for a particular matrix structure defining the neuron morphology (Hines, 1984), it seems like an achievable goal.

## Appendix A: Spike Container

## Appendix B: Example Simulation

The following Python code illustrates many aspects discussed in this article by implementing the simulation of a complete network of spiking neurons. The model corresponds to the CUBA benchmark described in Brette et al. (2007). It is a network of integrate-and-fire neurons with exponential excitatory and inhibitory currents and sparse random connectivity. It also includes homogeneous synaptic delays and refractoriness. The program simulates the network for 400 ms using vectorized algorithms and displays the raster plot and a sample voltage trace:

## Acknowledgments

We thank all those who tested early versions of Brian and made suggestions for improving it. This work was supported by the European Research Council (ERC StG 240132).