## Abstract

We present a novel approach to quantify the statistical interdependence of two time series, referred to as stochastic event synchrony (SES). The first step is to extract “events” from the two given time series. The next step is to try to align events from one time series with events from the other. The better the alignment, the more similar the two time series are considered to be. More precisely, the similarity is quantified by the following parameters: time delay, variance of the timing jitter, fraction of noncoincident events, and average similarity of the aligned events.

The pairwise alignment and SES parameters are determined by statistical inference. In particular, the SES parameters are computed by maximum a posteriori (MAP) estimation, and the pairwise alignment is obtained by applying the max-product algorithm. This letter deals with one-dimensional point processes; the extension to multidimensional point processes is considered in a companion letter in this issue.

By analyzing surrogate data, we demonstrate that SES is able to quantify both timing precision and event reliability more robustly than classical measures can. As an illustration, neuronal spike data generated by the Morris-Lecar neuron model are considered.

## 1. Introduction

Quantifying the interdependence between time series is an important and challenging problem. Although it is straightforward to quantify linear dependencies, the extension to nonlinear correlations is far from trivial. A variety of approaches have been proposed, stemming from research fields as diverse as physics, statistics, signal processing, and information theory (see, e.g., Stam, 2005; Quian Quiroga, Kraskov, Kreuz, & Grassberger, 2002; Pereda, Quiroga, & Bhattacharya, 2005; Kreuz, Haas, Morelli, Abarbanel, & Politi, 2007; Tiesinga, Fellous, & Sejnowski, 2008).

In this letter we propose a novel measure to quantify the interdependence between two point processes, referred to as stochastic event synchrony (SES). It consists of the following parameters: time delay, variance of the timing jitter, fraction of “noncoincident” events, and average similarity of the events. SES captures two different aspects of synchrony: timing precision and reliability. Those concepts can be understood from the following analogy: when you wait for a train in the station, the train may come at the station, or it may not come at all (e.g., it may be out of service due to some mechanical problem). If the train comes, it may or may not be on time. The former uncertainty is related to reliability, whereas the latter is related to precision. A similar distinction between timing precision and reliability has been made in Mainen and Sejnowski (1995) and Tiesinga et al. (2008). SES quantifies precision and reliability by the variance of the timing jitter and the fraction of the noncoincident events, respectively.

The pairwise alignment of point processes is cast as a statistical inference problem, which is solved by applying the max-product algorithm on a graphical model (Jordan, 1999; Loeliger, 2004; Loeliger et al., 2007). In the case of one-dimensional point processes, the graphical model is cycle free. The max-product algorithm is then equivalent to dynamic programming and is guaranteed to find the optimal alignment. For multidimensional point processes, the max-product algorithm is applied on a cyclic graphical model; this algorithm yields the optimal alignment as long as the opti-mal alignment is unique. This letter deals with one-dimensional point processes; the companion letter in this issue considers the extension to multidimensional point processes.

Although the method may be applied to any kind of time series (e.g., finance, oceanography, and seismology), in this and the companion letter, we solely consider time series that occur in the context of neuroscience. Synchrony is indeed an important topic in neuroscience. For instance, it is hotly debated whether the synchronous firing of neurons plays a role in cognition (Varela, Lachaux, Rodriguez, & Martinerie, 2001) and even in consciousness (Singer, 2001; Crick & Koch, 2003). The synchronous firing paradigm has also attracted substantial attention in both the experimental (e.g., Abeles, Bergman, Margalit, & Vaadia, 1993) and the theoretical neuroscience literature (e.g., von der Malsburg, 1981; Amari, Nakahara, Wu, & Sakai, 2003). Moreover, medical studies have reported that many neurophysiological diseases such as Alzheimer's disease are often associated with abnormalities in neural synchrony (Matsuda, 2001; Jeong, 2004). Therefore, the proposed method may be helpful to diagnose such mental disorders. The companion letter presents promising results on the early prediction of Alzheimer's disease based on electroencephalograms (EEG).

This letter considers the interdependence between two point processes. The proposed methods, however, can be extended to a collection of point processes. This extension is nontrivial: aligning a collection of point processes involves a significantly more complex combinatorial optimization problem; the optimal alignment becomes in general intractable. Therefore, one needs to resort to approximate inference methods. Those issues go beyond the scope of this letter and the companion letter and will be addressed in a future report.

In the next section, we introduce SES for the case of one-dimensional point processes. In section 3, we describe the underlying statistical model and explain in section 4 how one can perform inference in that model. In section 5, we review several classical (dis)similarity measures for one-dimensional point processes, since they will serve as benchmark for SES; more precisely, we will consider the Victor-Purpura distance metric (Victor & Purpura, 1997; Aronov, 2003; Victor, Goldberg, & Gardner, 2007), the van Rossum distance metric (van Rossum, 2001), the Schreiber et al. similarity measure (Schreiber, Fellous, Whitmer, Tiesinga, & Sejnowski, 2003), the Hunter-Milton similarity measure (Hunter & Milton, 2003), and the event synchronization measure proposed in Quian Quiroga, Kreuz, and Grassberger (2002). In section 6 we investigate the robustness and reliability of those classical (dis)similarity measures and SES by means of surrogate data. In section 7 we consider an application related to neuroscience: we quantify the firing reliability of Morris-Lecar type I and type II neurons using classical methods and SES. We offer some concluding remarks in section 8.

## 2. Principle

Let us consider the one-dimensional point processes (“event strings”) *x* and *x*′ in Figure 1a. Ignore *y* and *z* for now; they could be point processes in time, for example, (*x*_{1}= 1.3 s, *x*_{2}= 5.8 s, …) or space, for example, (*x*_{1}= 1.3 m, *x*_{2}= 5.8 m, …), or any other dimension. We wish to quantify to what extent *x* and *x*′ are synchronized. Intuitively, two event strings can be considered as synchronous (or “locked”) if they are identical apart from a time shift δ_{t}, small deviations in the event occurrence times (“event timing jitter”), or a few event insertions or deletions. More precisely, for two event strings to be synchronous, the event timing jitter should be significantly smaller than the average interevent time, and the number of deletions and insertions should comprise only a small fraction of the total number of events. This intuitive concept of synchrony is illustrated in Figure 1a. The event string *x*′ is obtained from event string *x* by successively shifting *x* over δ_{t} (resulting in *y*), slightly perturbing the event occurrence times (resulting in *z*), and, eventually, by adding (plus sign) and deleting (minus sign) events, resulting in *x*′. Adding and deleting events in *z* leads to “noncoincident” events in *x* and *x*′ (see Figure 1a): a noncoincident event in *x* is an event that cannot be paired with an event in *x*′, and vice versa.

_{t},

*s*, ρ), where

_{t}*s*is the variance of the (event) timing jitter and ρ is the percentage of noncoincident events, with

_{t}*n*and

*n*′ the total number of events in

*x*and

*x*′, respectively, and

*n*

_{non-co}and

*n*′

_{non-co}the total number of noncoincident events in

*x*and

*x*′, respectively. We will denote the standard deviation of the (event) timing jitter by σ

_{t}, and hence

*s*= σ

_{t}^{2}

_{t}. SES is related to the metrics (“distances” or “kernels”) proposed in Victor and Purpura (1997), Aronov (2003), Victor et al. (2007), Shpigelman, Singer, Paz, and Vaadia (2005), Eichhorn et al. (2004), and Schrauwen and Van Campenhout (2007), which are single numbers that quantify the similarity of event strings. In contrast, we characterize synchrony by means of three parameters, which allows us to distinguish two fundamentally different types of synchrony, as we will demonstrate in section 7 (see Figure 13). Moreover, our approach is rooted in statistical inference, in contrast to the metrics of Victor and Purpura (1997), Aronov (2003), Victor et al. (2007), Shpigelman et al. (2005), Eichhorn et al. (2004), and Schrauwen et al. (2007), which are derived from either optimization theory (Victor & Purpura, 1997; Aronov, 2003; Victor et al., 2007) or in the context of kernel machines (Shpigelman et al., 2005; Eichhorn et al., 2004; Schrauwen et al., 2007).

## 3. Statistical Model

We compute the SES parameters by performing inference in a generative probabilistic model for the sequences *x* and *x*′. In order to describe that model, we consider a symmetric procedure to generate *x* and *x*′, depicted in Figure 1b. Note that the procedure of Figure 1a corresponds to a conditional distribution *p*(*x*′|*x*; δ_{t}, *s _{t}*), which is asymmetric in

*x*and

*x*′. First, one generates an event string

*v*of length ℓ, where the events

*v*are mutually independent and uniformly distributed in [0,

_{k}*T*

_{0}]. The strings

*z*and

*z*′ are generated by delaying

*v*over −δ

_{t}/2 and δ

_{t}/2, respectively, and by (slightly) perturbing the resulting event occurrence times. We will model those perturbations as zero-mean gaussian random variables with variance

*s*/2. Next, some of the events in

_{t}*z*and

*z*′ are removed, resulting in the sequences

*x*and

*x*′; each event of

*z*and

*z*′ is removed with probability

*p*(“deletion”), independent of the other events. We denote by

_{d}*z*

_{rk}and

*z*′

_{rk}the events in

*z*and

*z*′, respectively, that correspond to

*v*. In the example of Figure 1b,

_{k}*r*= (1, 2, …, 10) =

*r*′. Occasionally, a pair of events (

*z*

_{rk},

*z*′

_{rk}) is removed (with probability

*p*

^{2}

_{d}), referred to as “event-pair deletion,” but more often, either

*z*

_{rk}or

*z*′

_{rk}is removed (a single-event deletion). If none of the events (

*z*

_{rk},

*z*′

_{rk}) is removed, there is an event in

*x*and in

*x*′ that corresponds to

*v*; we denote this event pair by (

_{k}*x*,

_{jk}*x*′

*). In the example of Figure 1b,*

_{jk}*j*= (1, 2, 3, 5, 6, 7, 8) and

*j*′ = (2, 3, 4, 5, 6, 7, 8). Note that if

*z*

_{rk}is deleted but not

*z*′

_{rk}, the corresponding event in

*x*′ becomes a noncoincident event, and vice versa. In the example of Figure 1b, events

*z*

_{1}and

*z*′

_{5}are deleted (single-event deletions), and as a result,

*x*′

_{1}and

*x*

_{5}are noncoincident events; also the pair (

*z*

_{10},

*z*′

_{10}) is removed (an event-pair deletion). It is easily verified that the expected length of the sequences

*x*and

*x*′ is (1 −

*p*)ℓ.

_{d}It is noteworthy that this procedure of generating the pair of point processes *x* and *x*′ may easily be extended to a collection of point processes. However, inference in the resulting probabilistic model is tractable only for pairs of point processes. If one considers more than two point processes, one needs to resort to approximate inference techniques. Such methods will be presented in a future report.

From now on, we will assume that the event pairs (*x*_{jk}, *x*′_{j′k}) are ordered: (*x*_{jk}, *x*′_{j′k}) occurs after the pair (*x*_{jk − 1}, *x*′_{j′k − 1}), or more precisely, *x*_{jk} ≥ *x*_{jk − 1} and *x*′_{j′k} ≥ *x*′_{j′k − 1} (for all *k*). This assumption is reasonable, since without it, there would be an unwieldy number of possible ways to generate the same point processes *x* and *x*′, and therefore, the problem of inferring the SES parameters would be ill posed. In fact, virtually all measures of event synchrony make use of this assumption explicitly or implicitly (see section 5 for a brief review). However, this assumption has some important consequences, as illustrated in Figure 2. If *s _{t}* is large, there is a high probability that events in

*x*and

*x*′ will not be ordered in time (see Figure 2a). Ignoring this fact will result in estimates of

*s*that are smaller than the true value

_{t}*s*. Obviously this issue concerns not only SES but event synchrony in general. In addition, some event deletions may be ignored: in Figure 2a one of the last two events of

_{t}*x*(and likewise

*x*′) is noncoincident; however, in the procedure of Figure 2b, they are both coincident. The latter generative procedure is simpler in the sense that it involves fewer deletions and the perturbations are slightly smaller. As a result, the parameter ρ (and hence also

*p*) is generally underestimated. Again, this problem concerns not only SES but any measure that quantifies how reliably events occur (see section 6). Both issues may be resolved to some extent by incorporating additional information. For example, in the case of spike trains, one may incorporate information about the spike shape; each spike is then described by its occurrence time and some additional parameters (e.g., shape parameters such as height and width). SES can be extended to incorporate such additional information, as we describe in the companion letter in this issue. When matching events of

_{d}*x*and

*x*′, we then no longer assume that those events are ordered in time; we allow reversals as in Figure 2a.

*x*and

*x*′. For clarity, we list in Table 1 the most relevant variables and parameters associated with that model. We now clarify each of those variables and parameters. The statistical model takes the form where

*b*and

*b*′ are binary strings that indicate whether the events in

*x*and

*x*′ are coincident. More specifically,

*b*= 1 if

_{k}*x*is noncoincident,

_{k}*b*= 0 otherwise, and likewise for

_{k}*b*′

_{k}. For mathematical convenience, we choose a geometric prior for the length ℓ: with λ

*T*

_{0}∈ (0, 1), as illustrated in Figure 3a. Since the events

*v*are assumed to be mutually independent and uniformly distributed in [0,

_{k}*T*

_{0}], we have Therefore, At first sight, it may seem more natural to model

*v*as a Poisson process (Gallager, 1996), which corresponds to a model

*p*(

*v*, ℓ) =

*p*(

*v*∣ ℓ)

*p*(ℓ), where

*p*(

*v*∣ ℓ) is also given by equation 3.3 but with different prior

*p*(ℓ) for the number of events ℓ (see Figure 3b), that is, a Poisson distribution with parameter κ

*T*

_{0}:

Symbol . | Explanation . |
---|---|

x and x′ | The two given point processes |

v | Hidden sequence from which the observed sequences x and x′ are generated |

z and z′ | Point processes obtained by shifting v over δ_{t}/2 and −δ_{t}/2, resp., and perturbing the timing of the resulting sequences (variance s/2) _{t} |

b and b′ | Binary sequences that indicate whether events in x and x′, resp., are coincident |

i and i′ | Indices of the events in v that generated x and x′, resp. |

j and j′ | Indices of the coincident events in x and x′, resp. |

n and n′ | Length of x and x′, resp. |

n_{del} and n′_{del} | Number of deletions in z and z′, resp. |

n^{tot}_{del} | Total number of deletions in z and z′ |

n_{del,single} and n′_{del,single} | Number of single-event deletions in z and z′, resp. |

n_{del,pair} and n′_{del,pair} | Number of event-pair deletions in z and z′, resp. |

n_{non-co} and n′_{non-co} | Number of noncoincident events in x and x′, resp. |

n^{tot}_{non-co} | Total number of noncoincident events in x and x′ |

ℓ | Length of v |

δ_{t} | Timing offset between x and x′ |

s _{t} | Timing jitter between x and x′ |

Symbol . | Explanation . |
---|---|

x and x′ | The two given point processes |

v | Hidden sequence from which the observed sequences x and x′ are generated |

z and z′ | Point processes obtained by shifting v over δ_{t}/2 and −δ_{t}/2, resp., and perturbing the timing of the resulting sequences (variance s/2) _{t} |

b and b′ | Binary sequences that indicate whether events in x and x′, resp., are coincident |

i and i′ | Indices of the events in v that generated x and x′, resp. |

j and j′ | Indices of the coincident events in x and x′, resp. |

n and n′ | Length of x and x′, resp. |

n_{del} and n′_{del} | Number of deletions in z and z′, resp. |

n^{tot}_{del} | Total number of deletions in z and z′ |

n_{del,single} and n′_{del,single} | Number of single-event deletions in z and z′, resp. |

n_{del,pair} and n′_{del,pair} | Number of event-pair deletions in z and z′, resp. |

n_{non-co} and n′_{non-co} | Number of noncoincident events in x and x′, resp. |

n^{tot}_{non-co} | Total number of noncoincident events in x and x′ |

ℓ | Length of v |

δ_{t} | Timing offset between x and x′ |

s _{t} | Timing jitter between x and x′ |

A comparison of Figures 3a and 3b shows that a Poisson prior is more informative than a geometric prior, especially if the parameter λ*T*_{0} of the geometric prior takes values close to 1. In fact, among all discrete probability distributions *p*(ℓ) supported on {1, 2, 3, …} with given expected value E[ℓ] = *L*, the geometric distribution with parameter λ*T*_{0} = 1 − 1/*L* is the one with the largest entropy. Therefore, if little prior knowledge about the length ℓ is available, it makes sense to use a geometric prior. On the other hand, if substantial prior knowledge about ℓ is available, it can in principle readily be encoded by a Poisson prior. However, for our purposes, the prior, equation 3.5, is mathematically less convenient. We come back to this issue at the end of section 3 and in section 4. Therefore, we adopt a geometric prior even if there is prior knowledge about ℓ. With that choice of prior, the estimates of the parameters δ_{t}, *s _{t}*, and ρ will in principle be less reliable than with a Poisson prior, since the model does not then incorporate all available information. However, the resulting loss in reliability is expected to be negligible. For reasonable lengths ℓ (e.g., ℓ>30), most information is typically contained in the observed sequences

*x*and

*x*′ and not in the prior

*p*(ℓ). The posterior

*p*(ℓ ∣

*x*,

*x*′) is then tightly concentrated around the true value of ℓ, and the prior

*p*(ℓ) varies only slightly over the support of

*p*(ℓ ∣

*x*,

*x*′). Besides this qualitative argument, we will show experimentally in section 6 that with a geometric prior

*p*(ℓ), the obtained estimates of δ

_{t},

*s*, and ρ are reliable (apart from biases due to the ambiguity inherent in event synchrony; see Figure 2). Interestingly, for both types of priors, the events in

_{t}*v*occur to a large extent independent of each other, since for given length ℓ, they are assumed to be mutually independent and uniformly distributed in [0,

*T*

_{0}] (cf. equation 3.3).

*b*and

*b*′ is given by where

*n*

^{tot}

_{del}is the total number of deleted events in

*x*and

*x*′: with

*n*

_{del}the number of deleted events in

*x*: and similarly,

*n*′

_{del}the number of deleted events in

*x*′:

*b*and

*b*′. We first expand

*n*

_{del}, the number of deleted events in

*z*, as

*n*

_{del}=

*n*

_{del,single}+

*n*

_{del,pair}, where

*n*

_{del,single}is the number of single-event deletions in

*z*and

*n*

_{del,pair}is the number of event-pair deletions. Likewise, we can write

*n*′

_{del}=

*n*′

_{del,single}+

*n*

_{del,pair}, where

*n*′

_{del,single}is the number of single-event deletions in

*z*′. Since a single deletion in

*z*results in a noncoincident event in

*x*′, it follows that and likewise, As a consequence, we have

*n*

_{del}=

*n*′

_{non-co}+

*n*

_{del,pair}and

*n*′

_{del}=

*n*

_{non-co}+

*n*

_{del,pair}, and therefore with

*n*

^{tot}

_{non-co}=

*n*

_{non-co}+

*n*′

_{non-co}. Combining equation 3.12 with 3.7 results eventually in the following expression for ℓ: Note that only the first term on the right-hand side depends on

*b*and

*b*′. In the example of Figure 1b, ℓ = 10,

*n*= 8 =

*n*′,

*n*

_{del}= 2 =

*n*′

_{del},

*n*

^{tot}

_{del}= 4,

*n*

_{del,single}= 1 =

*n*′

_{del,single},

*n*

_{del,pair}= 1, and

*n*

^{tot}

_{non-co}= 2.

*x*and

*x*′ are equal to where

*v*

_{ik}is the event in

*v*that corresponds to

*x*and likewise

_{k}*v*

_{i′k}, and is a univariate gaussian distribution with mean

*m*and variance

*s*. In the example of Figure 1b,

*i*= (2, 3, 4, 5, 6, 7, 8, 9) and

*i*′ = (1, 2, 3, 4, 6, 7, 8, 9). For simplicity, we adopt improper priors

*p*(δ

_{t}) = 1 =

*p*(

*s*).

_{t}*v*; later we will marginalize that model with regard to the length ℓ. For a given

*v*, three cases are possible:

_{k}- • The event was not deleted from either
*x*or*x*′, and the corresponding events in*x*and*x*′ are denoted by*x*_{jk}and*x*′_{j′k}. In equation 3.16, the following term appears: Integrating this term over*v*yields . Note that there are_{k}*n*^{tot}_{co}such terms, where*n*^{tot}_{co}is the total number of coincident event pairs: - •
The event was deleted once (from either

*x*or*x*′). In equation 3.16, there is only one gaussian term that corresponds to*v*. Integrating that term over_{k}*v*results in the term 1._{k} - •
The event was deleted twice (from

*x*and*x*′). There are no gaussian terms associated with*v*in equation 3.16. Therefore, expression 3.16 may be considered as constant with regard to_{k}*v*. Integrating this equation over_{k}*v*then leads to a term_{k}*T*_{0}. There are*n*_{del,pair}such terms.

*v*analytically (cf. equation 3.19) because of two reasons:

We have chosen a uniform conditional distribution

*p*(*v*∣ ℓ), equation 3.3.We model the offsets between an event

*v*and the corresponding events_{k}*x*_{ℓ}and*x*′_{ℓ′}in*x*and*x*′, respectively, as gaussian random variables. Therefore, the offset between the two events*x*_{ℓ}and*x*′_{ℓ′}is also gaussian distributed.

We wish to point out that more generally, the offset between *x*_{ℓ} and *x*′_{ℓ′} may be modeled by any infinite divisible distribution. A probability distribution *f* on the real line is by definition infinitely divisible if the following holds: if *X* is any random variable whose distribution is *f*, then for every positive integer *m*, there exist *m* independent identically distributed random variables *X*_{1}, …, *X _{m}* whose sum is equal in distribution to

*X*. Note that those

*m*other random variables do not usually have the same probability distribution as

*X*. In our setting, the offset between

*x*

_{ℓ}and

*x*′

_{ℓ′}takes the role of

*X*; we have

*m*= 2, and the variables

*X*

_{1}and

*X*

_{2}stand for the offset between

*v*and

_{k}*x*

_{ℓ}and the offset between

*x*′

_{ℓ′}and

*v*, respectively. If the distribution

_{k}*f*of the offset between

*x*

_{ℓ}and

*x*′

_{ℓ′}is infinite divisible, we can decompose that offset as the sum of the offset between

*v*and

_{k}*x*

_{ℓ}and the offset between

*x*′

_{ℓ′}and

*v*. Indeed, since

_{k}*f*is assumed to be infinitely divisible, there exists a distribution such that the offset between

*v*and

_{k}*x*

_{ℓ}and the offset between

*x*′

_{ℓ′}and

*v*are independently distributed according to , and their sum is distributed according to

_{k}*f*.

Examples of infinitely divisible distributions are the gaussian distribution and the Cauchy distribution, and all other members of the stable distribution family, which is a four-parameter family of continuous probability distributions that has the property of stability. If a number of independent and identically distributed random variables have a stable distribution, then a linear combination of these variables will have the same distribution, except for possibly different shift and scale parameters (Zolotarev, 1986). We will not further consider the extension to general infinite divisible distributions, since gaussian offsets suffice for our purposes.

*n*and

*n*′ are the length of the given point processes

*x*and

*x*′, respectively. The first term in equation 3.13 is fixed for given

*b*and

*b*′. Therefore, marginalizing

*p*(

*x*,

*x*′,

*b*,

*b*′, δ

_{t},

*s*, ℓ), equation 3.20, over ℓ is equivalent to marginalizing over

_{t}*n*

_{del,pair}: We wish to point out that in equation 3.23, we have a sum of a geometric series; since |

*p*

^{2}

_{d}λ

*T*

_{0}| < 1, we can apply the well-known formula for the sum of a geometric series, resulting in equation 3.24. We can rewrite the latter expression as with and The constant γ does not depend on

*b*and

*b*′, and therefore, it is irrelevant for estimating

*b*,

*b*′, and the SES parameters ρ, δ

_{t}, and

*s*. We will discard it in the following. On the other hand, the exponent of β in equation 3.25 clearly depends on

_{t}*b*and

*b*′ (cf. equations 3.10 and 3.11). Therefore, the parameter β affects the inference of

*b*,

*b*′ and the SES parameters. In section 7, we explain how the parameter β may be determined from given sequences

*x*and

*x*′. Moreover, we will interpret the parameter β in terms of cost functions (see equation 3.29). The expression log β is part of the cost associated with each noncoincident event.

After marginalizing with regard to *v* and ℓ, we obtain a model *p*(*x*, *x*′, *b*, *b*′, δ_{t}, *s _{t}*) (cf. equation 3.25) that is symmetric in

*x*and

*x*′. In the following, we will denote model 3.25 by

*p*(

*x*,

*x*′,

*j*,

*j*′, δ

_{t},

*s*) instead of

_{t}*p*(

*x*,

*x*′,

*b*,

*b*′, δ

_{t},

*s*), since it is more natural to describe that model in terms of

_{t}*j*and

*j*′ than in terms of

*b*and

*b*′ (cf. the right-hand side of equation 3.25). The sequences

*b*and

*b*′ may be obtained directly from

*j*and

*j*′: the variables

*b*(for all

_{k}*k*) equal zero if

*k*appears in the sequence

*j*and one otherwise. The variables

*b*′

_{k}(for all

*k*) may be obtained along the same lines.

*j*and

*j*′; in other words, it is independent of the assignment of coincident and noncoincident events. In the following, we investigate how equation 3.29 depends on the assignment

*j*and

*j*′, and ζ′ is then irrelevant.

*j*and

*j*′, for fixed δ

_{t}and

*s*. The unit cost

_{t}*d*(

*s*) associated with each noncoincident event equals The unit cost

_{t}*d*(

*x*

_{jk},

*x*′

_{j′k}; δ

_{t},

*s*

_{t}) of each event pair (

*x*

_{jk},

*x*′

_{j′k}) is the normalized Euclidean distance: Since the point processes

*x*and

*x*′ of Figure 1b are defined on the real line, the (normalized) Euclidean distance is indeed a natural metric. In some applications, the point process may be defined on more general curves (as illustrated in Figure 4); in such situations, one may adopt non-Euclidean distance measures. We are currently exploring such applications (see Dauwels et al., in press, for preliminary results).

*d*(

*s*), equation 3.30, and

_{t}*d*(

*x*

_{jk},

*x*′

_{j′k}; δ

_{t},

*s*

_{t}), equation 3.31, are dimensionless. In other words, they do not depend on the unit (e.g., seconds, milliseconds, meters, or millimeters) in which

*x*and

*x*′ are expressed. This property is obvious for

*d*(

*x*

_{jk},

*x*′

_{j′k}; δ

_{t},

*s*

_{t}. For

*d*(

*s*), it can be shown as follows:

_{t}The parameter *p _{d}* is dimensionless; the same holds for the product , and hence

*d*(

*s*) is also dimensionless. Therefore minimizing the cost (see equation 3.29) with regard to the sequences

_{t}*j*and

*j*′ (for fixed δ

_{t}and

*s*), or equivalently, performing maximum a posteriori estimation of

_{t}*j*and

*j*′ in

*p*(

*x*,

*x*′,

*j*,

*j*′, δ

_{t},

*s*, ℓ), equation 3.25, will yield the same solutions and , independent of the units of

_{t}*x*and

*x*′. In section 4 we explain how one may estimate

*j*and

*j*′.

By interpreting equation 3.29 as a cost function, we also established a connection between SES and the distance metric of Victor and Purpura (1997), Aronov (2003), and Victor et al. (2007). The latter is also formulated in terms of a cost function, more precisely, it is determined as the minimum cost associated with transforming one point process into the other. In this transformation, one is allowed to delete and insert events and move events over time (cf. Figure 1a), and there is a cost associated with each of those three basic operations. For the sake of completeness, we will review the distance metric of Victor and Purpura (1997), Aronov (2003), and Victor et al. (2007) in section 5.1.

Note also that the Poisson prior, equation 3.5, leads to a cost that depends nonlinearly on the number of noncoincident events. The cost per noncoincident event is then not constant but depends on the total number of noncoincident events. Since a constant cost per noncoincident event is easier to interpret and leads to a simpler inference algorithm (see section 4), we decided to use the geometric prior, equation 3.2.

## 4. Statistical Inference

*x*and

*x*′, we wish to infer the parameters δ

_{t}and

*s*, and the sequences

_{t}*j*and

*j*′ (cf. equation 3.25). From decisions and , one can easily determine the corresponding decisions and ; the decision (for all

*k*) equals zero if

*k*appears in the sequence and is one otherwise. The decisions (for all

*k*) may be obtained along the same lines. The decisions and naturally amount to an estimate of ρ (cf. equation 2.1): Moreover, the parameters

*T*

_{0}, λ, and

*p*are unknown and need to be chosen appropriately. Interestingly, they do not need to be specified individually, since they appear in equation 3.25 only through β. The latter serves in practice as a knob to control the number of noncoincident events; we address this issue in section 7.

_{d}In appendix A, we explain how expressions 4.2 and 4.3 can be computed. In particular, we derive a closed-form expression for equation 4.3. We show that equation 4.2 may be obtained by considering paths on a (*n* + 1) × (*n*′ + 1) grid (see Figure 5), where each path corresponds to a pair (*j*, *j*′). One can associate a cost *M* to each path, obtained by adding the costs *d*(*s _{t}*) of each corresponding noncoincident event and the costs

*d*(

*x*

_{jk},

*x*′

_{j′k}; δ

_{t},

*s*

_{t}associated with each pair of coincident events (

*x*

_{jk},

*x*′

_{j′k}). Finding the pair , equation 4.2, corresponds to the problem of determining the minimum-cost path, which may be achieved by means of a simple recursion. The resulting algorithm is summarized in Table 2. (Matlab code of the algorithm is available at http://www.dauwels.com/SESToolbox/SES.html.)

Note that if one adopts a Poisson prior, equation 3.5, expression 4.2 may no longer be obtained by determining the minimum-cost path on a grid. In fact, this expression becomes intractable, and one would need to resort to approximative methods.

The algorithm is an instance of coordinate descent, which generally is guaranteed to converge if the iterated conditional maximizations have unique solutions (cf. equations 4.2 and 4.3; Bezdek & Hathaway 2002; Bezdek, Hathaway, Howard, Wilson, & Windham 1987). The conditional maximization, equation 4.3, has a unique solution (cf. A.1 and A.3); the solution of equation 4.2 is in most practical situations unique. In our experiments (cf. sections 6 and 7), the algorithm always converged. We provide numerical results on convergence in section 7.

In general, the fixed points of a coordinate descent algorithm are stationary points of the objective function at hand. In particular, alternating equations 4.2 and 4.3 converges to stationary points of *p*(*x*, *x*′, *j*, *j*′, δ_{t}, *s _{t}*). Since this model may have numerous stationary points, it may be necessary to run these equations with several different initial values and , resulting in several fixed points . Eventually the fixed point with the largest value is selected. In practice, there is often prior knowledge about δ

_{t}and

*s*. For example, in the case of neural spike trains (see section 7), the lag δ

_{t}_{t}is usually not larger than 100 ms; similarly,

*s*is typically not larger than (100 ms)

_{t}^{2}. In most applications, it makes sense to start with the initial value . However, depending on the available computational resources, the algorithm may be run with additional initial values . It may also be run with several values of between 0 and the largest plausible value (e.g., ms)

^{2}, (20 ms)

^{2}, …, (100 ms)

^{2}). For the sake of completeness, we specify how we selected the initial values and in the applications of section 6 and 7.

In principle, the computational complexity grows proportional to *nn*′ (i.e., the product of both sequence lengths). However, the state space can be restricted to pairs of events that are close to each other—pairs of events that fulfill the constraint (for some δ^{max}_{t}>0, e.g., δ^{max}_{t} = 100 ms). The paths on the grid of Figure 5 then remain close to the diagonal, and only the entries around the diagonal of *M* are computed (cf. equation A.4). As a result, the computational complexity becomes linear in the sequence length.

## 5. Review of Classical Similarity Measures

In this section, we review some of the best-known classical (dis)similarity measures for one-dimensional point processes, including the Victor-Purpura distance metric (Victor & Purpura, 1997; Aronov, 2003; Victor et al., 2007), the van Rossum distance metric (van Rossum, 2001), the Schreiber et al. similarity measure (Schreiber et al., 2003), the Hunter-Milton similarity measure (Hunter & Milton, 2003), and the event synchronization measure proposed in Quian Quiroga, Kreuz, et al. (2002). We have chosen the same selection of classical measures as in Kreuz et al. (2007). Note that this selection is not exhaustive; various other (dis)similarity measures for one-dimensional point processes have recently been proposed, such as the ISI distance in Kreuz et al. (2007). In order to benchmark SES in a meaningful way, however, we have restricted ourselves to existing measures that are the most related to SES.

In this section, we closely follow the exposition in Kreuz et al. (2007). For the sake of definiteness, we will discuss the measures in the context of point processes in time.

### 5.1. Victor-Purpura Spike Train Metric.

The distance metric *D _{V}* of (Victor & Purpura, 1997; Aronov, 2003; Victor et al., 2007) is closely related to SES, as we pointed out earlier. It defines the distance between two point processes as the minimum cost of transforming one point process into the other. This transformation is carried out by combining three basic operations: event insertion, event deletion, and event movement (cf. Figure 1a). The cost of deleting or inserting an event is set to one, whereas the cost of moving an event in time is proportional to the time shift; the proportionality constant

*C*defines the timescale of the distance metric.

_{V}If *C _{V}* = 0, the distance metric

*D*reduces to the difference in number of events. If

_{V}*C*≫ 1, the distance quantifies the number of noncoincident events. Indeed, since for large

_{V}*C*, it becomes less favorable to move events, one point process is transformed into the other, mostly by inserting and deleting events. The cost

_{V}*D*associated with this transformation is then (approximately) proportional to the number of noncoincident events. In practice, most events do not perfectly coincide; therefore, in the limit

_{V}*C*≫ 1, virtually all events are either inserted or deleted in the tranformation from one point process to the other. In the intermediate regime

_{V}*C*≈ 1, neighboring events are treated as coincident; they no longer need to occur at precisely the same time. In that regime,

_{V}*D*is similar to the SES parameter ρ.

_{V}It is important to realize that *C _{V}* is not dimensionless. Since the cost

*C*Δ

_{V}*T*associated with moving an event over Δ

*T*is supposed to be dimensionless, the unit of

*C*is the inverse of the unit in which

_{V}*x*and

*x*′ are expressed. For example, if

*x*and

*x*′ are expressed in milliseconds, the condition

*C*≈ 1 stands for

_{V}*C*≈ 10

_{V}^{−3}ms

^{−1}. Note that the metric

*D*is dimensionless.

_{V}If and only if the point processes *x* and *x*′ are identical, the distance metric *D _{V}* = 0. The time constant τ

_{V}= 1/

*C*, the inverse of

_{V}*C*, defines the timescale of distance metric

_{V}*D*.

_{V}It is noteworthy that in SES, the unit cost of moving an event in time is quadratic in the time shift (with proportionality constant 1/2*s _{t}*; cf. equation 3.31), in contrast to the Victor-Purpura metric, where the cost is linear in the time shift. Note that the proportionality constant

*C*in the Victor-Purpura metric is fixed and needs to be chosen by the user. The proportionality constant 1/2

_{V}*s*in SES is determined adaptively from the given point processes. In both approaches, the minimum cost is computed by the Viterbi algorithm (Forney, 1973; cf. appendix A).

_{t}### 5.2. Van Rossum Similarity Measure.

*x*is convolved with an exponential function exp(

*t*−

*x*/τ

_{k}_{R}) (with

*t*>

*x*), resulting in the time series

_{k}*s*(

*t*). Likewise each event of

*x*′ is convolved with this exponential function, leading to the time series

*s*′(

*t*). From the time series

*s*(

*t*) and

*s*′(

*t*), the van Rossum distance measure (van Rossum, 2001) is computed as

Note that *D _{R}*(σ

_{S}) = 0 if and only if

*x*and

*x*′ are identical. The timescale of this distance measure is determined by the time constant τ

_{R}.

### 5.3. Schreiber et al. Similarity Measure.

*x*and

*x*′ are first convolved with a filter, resulting in time series

*s*(

*t*) and

*s*′(

*t*). The filter may, for example, be exponential (Haas & White, 2002) or gaussian (Schreiber et al., 2003), and it has a certain width τ

_{S}. Next the pairwise correlation between the time series

*s*(

*t*) and

*s*′(

*t*) is computed:

In Haas and White (2002), one adjusts the phase lag between the time series, whereas in Schreiber et al. (2003), no phase lag is allowed. In this letter (as in Kreuz et al., 2007), we consider the approach of Schreiber et al. (2003). Note that the width τ_{S} of the filter defines the timescale of interaction between the two point processes. We also point out that if and only if *x* and *x*′ are identical do we have *S _{S}* = 1.

### 5.4. Hunter-Milton Similarity Measure.

*x*, one identifies the nearest event in the point process

_{k}*x*′. The degree of coincidence between those two events is determined as . Along the same lines, one identifies for each the nearest event in the point process

*x*and determines the degree of coincidence . The similarity

*S*between

_{H}*x*and

*x*′ is then computed as

The parameter τ_{H} sets the timescale for event coincidence. If *x* and *x*′ are identical, we have *S _{H}* = 1.

### 5.5. Event Synchronization.

_{Q}. This lag can be fixed by the user, or it can be extracted automatically from the point processes

*x*and

*x*′:

*x*shortly after an event appears in

*x*′: where where τ

_{Q}may be fixed or may be computed according to equation 5.4.

If and only if all events in *x* and *x*′ are coincident, we have *S _{Q}* = 1.

### 5.6. Discussion.

#### 5.6.1. Binning.

Interestingly, the classical approaches and SES do not discretize the time (or space) axis, in contrast to other methods (e.g., Johnson, Gruner, Baggerly, & Seshagiri, 2001; Christen, Kohn, Ott, & Stoop, 2006). The latter divide the time (space) axis in bins and then convert the point processes into binary sequences. If an event occurred within a bin, then a one is associated with that bin, otherwise a zero. A critical issue is the choice of bin width, since the results may depend on this parameter. SES and the classical measures avoid that issue since they do not rely on binning. This also holds for other measures, for example, the ISI distance (Kreuz et al., 2007).

#### 5.6.2. Timescale.

Several of the above measures depend on a parameter that defines the timescale of the interaction between the point processes, in particular, the Victor-Purpura distance metric (Victor & Purpura, 1997; Aronov, 2003; Victor et al., 2007), the van Rossum similarity measure (van Rossum, 2001), the Schreiber et al. similarity measure (Schreiber et al., 2003), and the Hunter-Milton similarity measure (Hunter & Milton, 2003). Event synchronization, however, adapts its timescale automatically; the user does not need to specify it. The same holds for SES: the timescale is determined by the parameter *s _{t}*, which is computed by the algorithm, and does not need to be specified a priori. One just needs to choose initial values within the range of plausible values. The SES inference algorithm (cf. Table 2) then refines those initial estimates and selects the most appropriate one. Besides event synchronization and SES, also other existing measures do not rely on a fixed time scale, e.g., the ISI-distance (Kreuz et al., 2007).

In some applications, the user may prefer to use an automatic procedure to determine the timescale; in other applications, one may wish to investigate how the similarity depends on the timescale. For instance, the timescale may be chosen based on optimizing some desired quantity, such as classification fidelity, as in Victor and Purpura (1997). In event synchronization, one can fix the timescale instead of using the adaptive rule, equation 5.4. Likewise, in SES, one may fix *s _{t}* instead of estimating it.

#### 5.6.3. Delays.

There might be a delay between the two point processes *x* and *x*′. Before the classical measures can be applied, potential delays need to be estimated and the point processes shifted accordingly. SES directly handles delays and does not require a separate procedure to estimate delays. As a consequence, the estimates of *s _{t}* and ρ are robust against lags between the point processes, as we demonstrate in section 6.

#### 5.6.4. Matching.

The van Rossum and Schreiber et al. measures allow matching between a single event in one train and multiple events in the other, since the exponential kernel of an event in one train may overlap with the exponential kernels of multiple events in the other train. Similarly in event synchronization and the Hunter-Milton measure, an event may be matched with multiple events. In SES and the Victor-Purpura metric, each event can be coincident with at most one other event.

## 6. Analysis of Surrogate Data

Here we investigate the robustness and reliability of SES and the classical (dis)similarity measures reviewed in section 5. In order to benchmark the different measures, we apply them to surrogate data. For these data, the true values of event reliability and timing jitter are known and directly controllable. As far as we know, such comparison using surrogate data has not been carried out yet. In the investigation of Tiesinga et al. (2008), the measures were applied to spike trains generated by a Hogdkin-Huxley type model; for such models, the true values of event reliability and timing jitter are unknown.

We randomly generated 10,000 pairs of one-dimensional point processes (*x*, *x*′) according to the symmetric procedure depicted in Figure 1b. For definiteness, we assume that *x* and *x*′ are point processes in time. We considered several values of the parameters ℓ, *p _{d}*, δ

_{t}, and

*s*(σ

_{t}_{t}). More specifically, the length ℓ was chosen as ℓ = ℓ

_{0}/(1 −

*p*), where is a constant. With this choice, the expected length of

_{d}*x*and

*x*′ is ℓ

_{0}, which is independent of

*p*. We considered the values ℓ

_{d}_{0}= 40 and 100,

*p*= 0, 0.1, …, 0.4, δ

_{d}_{t}= 0 ms, 25 ms, 50 ms, and σ

_{t}= 10 ms, 30 ms, and 50 ms. The parameter

*T*

_{0}was chosen as ℓ

_{0}· 100 ms. The average spiking rate therefore is about 10 Hz for all choices of ℓ

_{0}and

*p*.

_{d}In the SES approach, we used the initial values , 30, and 70 and . The parameter β was identical for all settings of ℓ, *p _{d}*, δ

_{t}, and

*s*: β = 0.02. It was optimized to yield the best overall results. There are perhaps ways to determine β from a single pair of point processes, which would allow us to determine β for each setting of ℓ,

_{t}*p*, δ

_{d}_{t}, and

*s*separately; we leave this issue as a topic for further research. In practice, however, one often considers multiple point processes simultaneously. In section 7 we determine the SES parameters from multiple point processes and describe a method to determine the optimal parameter β.

_{t}*C*of the Victor-Purpura metric was set to 0.001 and 0.1 ms

_{V}^{−1}. For the time constants τ

_{R}, τ

_{S}, and τ

_{H}(cf. section 5), we considered the values 10 ms and 20 ms. For the time constant τ

_{Q}(cf. section 5), we chose the values 20 ms and 40 ms. Those values of the different time constants seemed to yield the most reasonable results. Since we consider different lengths ℓ

_{0}, we normalized the Victor-Purpura metric

*D*by the number of events in both point processes: that is, we consider the normalized metric defined as

_{V}In order to assess the (dis)similarity measures, we compute for each parameter setting and for each measure *S* the expectation E[*S*] and normalized standard deviation . Those statistics are computed by averaging over 10,000 pairs of point processes (*x*,*x*′), randomly generated according to the symmetric procedure depicted in Figure 1b.

### 6.1. Results.

Results for SES are summarized in Figure 6. From this figure, we can make the following observations:

- •
The estimates of

*s*and ρ are biased, especially for small ℓ_{t}_{0}(in particular ℓ_{0}= 40),*s*⩾ (30 ms)_{t}^{2}, and*p*>0.2; more specifically, the expected value of those estimates is smaller than the true value, which is due to the ambiguity inherent in event synchrony (cf. Figure 2). On the other hand, the estimates of δ_{d}_{t}are unbiased for all considered values of δ_{t},*s*, and_{t}*p*(not shown here)._{d} - •
The estimates of

*s*only weakly depend on_{t}*p*, and vice versa._{d} - •
The estimates of

*s*and ρ do not depend on δ_{t}_{t}; they are robust to lags δ_{t}, since the latter can be estimated reliably. - •
The normalized standard deviation of the estimates of δ

_{t},*s*, and ρ grows with_{t}*s*and_{t}*p*, but it remains below 30% (not shown here). Those estimates are therefore reliable._{d} - •
The expected value of the estimates of

*s*and ρ barely depends on the length ℓ_{t}_{0}. The normalized standard deviation of the SES parameters decreases as the length ℓ_{0}increases (not shown here), as expected.

In other words, the SES algorithm results in reliable estimates of the SES parameters *s _{t}* and ρ.

Results for the classical measures reviewed in section 5 are summarized in Figures 7 and 8. For clarity, we show only the results for δ_{t} = 0 in those figures. The influence of lags on classical measures will be investigated later in this section (see Figure 9a). We first consider the results for the Victor-Purpura distance metric (Victor & Purpura, 1997; Aronov, 2003; Victor et al., 2007), which are summarized in Figure 7. From that figure we can see the following:

- •
For

*C*= 0.001 ms_{V}^{−1}, the distance metric grows with*p*and is practically independent of_{d}*s*. (This is the “intermediate” regime mentioned in section 5.1.) In this regime, the metric is proportional to the number of noncoincident events, and it behaves similarly to ρ; however, due to ambiguity inherent in event synchrony (cf. Figure 2), for_{t}*p*>0.2, it overestimates the number of coincident events and underestimates_{d}*p*. For larger_{d}*C*, in particular_{V}*C*= 0.1 ms_{V}^{−1}, the metric clearly depends on both*p*and_{d}*s*. It is noteworthy that the value_{t}*C*= 0.1 ms_{V}^{−1}depends on*T*_{0}/ℓ_{0}(average distance between events in*x*and*x*′), which is 100 ms.If

*C*= 0 ms_{V}^{−1}(not shown here), the metric is close to zero, since it is equal to the difference in length of both point processes*x*and*x*′, and in our experiments, both point processes have equal length on average. However, note that the metric is not exactly equal to zero, since the length of the sequences is not identical in every realization but only on average. The difference in length fluctuates more strongly as*p*increases. Therefore, for_{d}*C*= 0 ms_{V}^{−1}, increases (weakly) with*p*, independent of_{d}*s*, but remains close to zero (). On the other hand, if_{t}*C*≫ 1 ms_{V}^{−1}(not shown here), the metric is close to one, independent of*p*and_{d}*s*. In the transformation of one point process into the other, (almost) every event is either deleted or inserted, and therefore, a cost of 1 is associated with (almost) every event._{t} - •
The normalized standard deviation of decreases with

*p*and_{d}*s*, and it remains below 30% (not shown here). The estimates of are therefore reliable._{t} - •
The expected value of does not depend on the length ℓ

_{0}; however, its normalized standard deviation decreases as the length ℓ_{0}increases (not shown here).

The results for the van Rossum distance measure *D _{R}* (van Rossum, 2001) are summarized in Figure 8. From that figure one can see the following:

- •
The distance metric

*D*grows with both_{R}*p*and_{d}*s*, as does the distance metric (for_{t}*C*≫ 0.001 ms_{V}^{−1}). - •
The normalized standard deviation of

*D*is largely independent of_{R}*p*and decreases with_{d}*s*(not shown here). Since it remains below 15%, the estimates of_{t}*D*are reliable._{R} - •
Like the metric , the expected value of

*D*does not depend on the length ℓ_{R}_{0}; however, its normalized standard deviation decreases as the length ℓ_{0}increases (not shown here). - •
As the time constant τ

_{R}increases, the expected value of*D*and its normalized standard deviation decrease (not shown here). This can be explained as follows: the larger τ_{R}_{R}, the more the time series*s*(*t*) and*s*′(*t*) overlap, and hence the smaller*D*. Since there are generally also more coincident events for larger_{R}*D*, the fluctuations of_{R}*D*are smaller. As a result, its normalized standard deviation becomes smaller._{R}

We made similar observations for the Schreiber et al. measure *S _{S}* (Schreiber et al., 2003), the Hunter-Milton measure

*S*(Hunter & Milton, 2003), and event synchronization

_{H}*S*(both with fixed and adaptive time constant τ

_{Q}_{Q}(

*k*,

*k*′); see equation 5.4).

*Laplacian*distribution instead: where

*w*is a scale parameter. The variance

*s*of the Laplacian distribution is given by

*s*= 2

*w*

^{2}, and hence the parameter

*w*is related to the variance

*s*as .

More specifically, the generative process is as follows: First, one generates an event string *v* of length ℓ, where the events *v _{k}* are mutually independent and uniformly distributed in [0,

*T*

_{0}]. The strings

*z*and

*z*′ are generated by delaying

*v*over −δ

_{t}/2 and δ

_{t}/2, respectively, and by (slightly) perturbing the resulting event occurrence times. Those perturbations are zero-mean Laplacian variables with variance

*s*/2; the scale parameter

_{t}*w*(cf. equation 6.2) is chosen as . Next, some of the events in

*z*and

*z*′ are removed, resulting in the sequences

*x*and

*x*′; each event of

*z*and

*z*′ is removed with probability

*p*(“deletion”), independent of the other events.

_{d}We again considered the values ℓ_{0} = 40 and 100, *p _{d}* = 0, 0.1, …, 0.4, δ

_{t}= 0 ms, 25 ms, 50 ms, and σ

_{t}= 10 ms, 30 ms, and 50 ms, and the parameter

*T*

_{0}was again chosen as ℓ

_{0}· 100 ms. The results are summarized in Figure 10. A comparison of Figure 10 with Figure 6 shows that the results for both generative processes are very similar. This suggests that SES can robustly quantify timing precision and event reliability.

### 6.2. Discussion.

From this study of surrogate data, we can conclude the following:

- •
SES and the classical measures considered in this letter are reliable in the sense that their statistical fluctuations are relatively small. Their normalized standard deviation is typically below 30%, and often even below 20%.

- •
The longer the point process, the more reliable the measures become, as expected.

- •
The classical measures are sensitive to lags. Therefore, one needs to estimate potential lags before they can be applied. As an illustration, Figure 9a shows how the Schreiber et al. measure

*S*depends on δ_{S}_{t}; clearly,*S*drops as the delay δ_{S}_{t}increases. On the other hand, SES directly incorporates lags, and as a result, the estimates*s*and ρ are robust to lags (cf. Figure 6). However, it is critical to choose an appropriate set of initial values . For example, if one uses only as the initial value, the estimates_{t}*s*and ρ become more sensitive to lags (not shown here). In other words, one of the initial values should be sufficiently close to the true lag. Therefore, prior information about potential lags is crucial for the success of the SES algorithm. If no such prior information is available, one needs to choose multiple initial values in a wide range; if the true lag falls within that range, the SES algorithm will most likely yield reliable estimates of_{t}*s*and ρ. On the other hand, if the true lag is far from the initial values , the estimates of_{t}*s*and ρ may not be reliable._{t} - •
Most classical measures depend on both

*p*and_{d}*s*, and therefore they are unable to separate the two key aspects of synchrony: timing precision and event reliability. There is one exception: the distance metric grows with_{t}*p*for small cost factors_{d}*C*, independent of_{V}*s*(cf. Figures 7a and 7c). The same holds for the SES parameter ρ (cf. Figures 6c and 6d); both and ρ are measures of event reliability. Note that ρ is robust to lags δ_{t}_{t}, in contrast to . The SES parameter*s*is largely independent of_{t}*p*(cf. Figures 6a and 6b); it is a robust measure for timing dispersion. Interestingly, the parameters_{d}*s*and ρ seem to quantify timing precision and event reliability, respectively, even if the data at hand are generated from a model that differs from the SES model (cf. Figure 1b). We wish to point out once more, however, that all (dis)similarity measures for one-dimensional point processes underestimate the timing dispersion and the number of event deletions due to the ambiguity inherent in event synchrony (cf. Figure 2)._{t} - •
There exists a classical procedure to estimate the timing dispersion based on the Schreiber et al. measure

*S*(see, e.g., Tiesinga et al., 2008). One computes_{S}*S*for a range of values of τ_{S}_{S}. The value of τ_{S}at which*S*= 0.5 is considered as an estimate σ_{S}_{S}of the timing dispersion. Similarly one may determine timing dispersion from other classical measures, such as the Hunter-Milton similarity measure. It is important to realize, however, that since the classical measures significantly depend on*p*(with the exception of the Victor-Purpura distance for sufficiently small_{d}*C*), the resulting estimates of timing dispersion will significantly depend on_{V}*p*. This is illustrated in Figure 9b. From the figure, it can be seen that both the similarity measure_{d}*S*and the timing dispersion estimate σ_{S}_{S}significantly depend on*p*. For example, σ_{d}_{S}is equal to 12 ms for the parameter settings (σ_{t}= 30 ms,*p*= 0.4) and (σ_{d}_{t}= 50 ms,*p*= 0.1); in other words, σ_{d}_{S}is not a reliable measure for timing dispersion, and the same holds for similar estimates of timing dispersion, for example, derived from the Hunter-Milton similarity measure. In contrast, the estimate of the SES parameter*s*does not suffer from those shortcomings (see Figures 6a and 6b)._{t} - •
SES is significantly more computationally complex than some classical similarity measures, such as the Hunter-Milton similarity measure. In principle, the complexity of the SES inference algorithm scales quadratically with the sequence length. Without further modifications, the SES algorithm is practical only for sequences of length 100 and less. However, a very reasonable approach to limit the computationally complexity is to consider only pairs of events that are sufficiently close to each other. For example, in the application at hand, it is not likely that two events with a lag of more than 500 ms form an event pair. Therefore, such pairs can be discarded a priori in the SES algorithm. The complexity then becomes linear in the sequence length, and the SES algorithm remains practical for sequences of length 1000 and more.

- •
The SES algorithm leads to reliable estimates of

*s*and ρ only if the parameter β is appropriately chosen. In the application at hand, β was fixed for all parameter settings, and we chose the value of β that resulted in the most reliable estimates. In the application in section 7, we will propose a technique to determine β from multiple given point processes._{t}

## 7. Application: Firing Reliablity of a Neuron

In this section, we investigate an application related to neuroscience. In particular, we apply SES to quantify the firing reliability of neurons. We consider the Morris-Lecar neuron model (Morris & Lecar, 1981), which exhibits properties of type I and II neurons (Gutkin & Ermentrout, 1998; Tsumoto, Kitajima, Yoshinaga, Aihara, & Kawakami, 2006; Tateno & Pakdaman, 2004). The spiking behavior differs in both neuron types, as illustrated in Figures 12 and 13. In type II neurons, the timing jitter is small, but spikes tend to drop out. In type I neurons, fewer spikes drop out, but the dispersion of spike times is larger. In other words, type II neurons prefer to stay coherent or to be silent; type I neurons follow the middle course between those two extremes (Robinson, 2003).

This difference in spiking behavior is due to the way periodic firing is established (Gutkin & Ermentrout, 1998; Tsumoto et al., 2006; Tateno & Pakdaman, 2004; Robinson, 2003). In type I neurons, periodic firing results from a saddle-node bifurcation of equilibrium points. Such neurons show a continuous transition from zero frequency to arbitrary low frequency of firing. Pyramidal cells are believed to be type I neurons. In type II neurons, periodic firing occurs by a subcritical Hopf bifurcation. Such neurons show an abrupt onset of repetitive firing at a higher firing frequency; they cannot support regular low-frequency firing. Squid giant axons and the Hodgkin-Huxley model are type II.

In the following section, we describe the Morris-Lecar neuron model in more detail. In section 7.2.1, we apply both SES and classical (dis)similarity to quantify the firing reliability of both types of neurons and discuss how the difference in spiking behavior is reflected in those (dis)similarity measures.

### 7.1. Morris-Lecar Neuron Model.

*M*

_{∞},

*N*

_{∞}, and λ

_{N}are the following functions:

Depending on the parameters of the system, the M-L neuron model behaves as a type I or II neuron. Gutkin and Ermentrout (1998) have determined a setting of the system parameters for each type. Table 3 lists parameter values that are different in the two classes, and Table 4 lists common parameter values. The analysis of Gutkin and Ermentrout (1998) was further refined in Tsumoto et al. (2006) and Tateno and Pakdaman (2004). However, for our purposes, the parameter setting of Tables 3 and 4 suffices.

Parameter . | Value . |
---|---|

C _{M} | 5 [μF/cm^{2}] |

g_{K} | 8 [μS/cm^{2}] |

g_{L} | 2 [μS/cm^{2}] |

V_{Ca} | 120 [mV] |

V_{K} | −80 [mV] |

V_{L} | −60 [mV] |

V_{1} | −1.2 [mV] |

V_{2} | 18 [mV] |

Parameter . | Value . |
---|---|

C _{M} | 5 [μF/cm^{2}] |

g_{K} | 8 [μS/cm^{2}] |

g_{L} | 2 [μS/cm^{2}] |

V_{Ca} | 120 [mV] |

V_{K} | −80 [mV] |

V_{L} | −60 [mV] |

V_{1} | −1.2 [mV] |

V_{2} | 18 [mV] |

Note: This parameter setting is used in both types of neurons.

*I*

_{ext}is equal to where

*n*(

*t*) is zero-mean white gaussian noise with variance σ

^{2}

_{n}. Figure 11 shows a realization of

*I*

_{ext}. The sinusoidal component forces the neuron to spike regularly; however, the precise timing varies from trial to trial due to the noise

*n*(

*t*). Our objective is to investigate how the noise affects the spike timing and the tendency to drop spikes. We are especially interested in how the effect of noise differs in both neuron types. The parameter settings for the input current

*I*

_{ext}are listed in Table 5. We have chosen the parameters such that we obtain the typical spiking behavior of both types of neurons, as described in Robinson (2003). Figure 12 shows the membrane potential

*V*, equation 7.1, for five trials. By thresholding

*V*, we obtain the raster plots of Figure 13; we show 50 trials.

### 7.2. Results.

#### 7.2.1. Stochastic Event Synchrony.

We computed the SES parameters for each pair of the 50 trials and for different values of β. Next we averaged those parameters over all pairs; since there are 50 trials, we have 1225 such pairs. A similar approach was followed in Haas and White (2002), Schreiber et al. (2003), and Hunter and Milton (2003). We set , and in order to overcome local extrema, we use multiple initial values , (3 ms)^{2}, (5 ms)^{2}, (7 ms)^{2} and (9 ms)^{2}. Each initialization of (, ) may lead to a different solution (, , , ). We choose the most probable solution—the one that has the largest value .

Note that instead of considering all 1225 pairs of trials, an arguably more elegant approach would be to consider all 50 trials jointly. As we pointed out earlier, SES can indeed be extended to collections of point processes, but this goes beyond the scope of this letter and the companion letter.

Figures 14a and 14b show how the average *s _{t}* (σ

_{t}) and ρ, respectively, depend on β for both neuron types. Figure 14c shows

*s*(σ

_{t}_{t}) as a function of ρ for several values of β. The “optimal” values of (β,

*s*, ρ) are indicated by arrows. Later we will explain how we determined those values. From those three figures, it becomes immediately clear that the parameter ρ is significantly smaller in type I than in type II neurons (for β ∈ [10

_{t}^{−10}, 10

^{−1}]). In contrast,

*s*is vastly larger. This agrees with our intuition: since in type II neurons, spikes tend to drop out, ρ should be larger. But since the timing dispersion of the spikes in type I is larger, we expect

_{t}*s*to be larger in those neurons.

_{t}Figures 14a to 14c show the pair (*s _{t}*, ρ) for various values of β. Of course, we eventually want to describe the firing reliability by one pair (

*s*, ρ), but how should we select β? If we choose β too small, some noncoincident events will be treated as coincident events; they will be matched with other events, resulting in large offsets. As a consequence, the distribution of the offsets will have a significant number of outliers, which leads to an inconsistency: in model

_{t}*p*(

*x*,

*x*′,

*b*,

*b*′, δ

_{t},

*s*, ℓ), equation 3.25, this distribution is supposed to be gaussian, which cannot capture the large number of outliers. In addition, due to those outliers, the parameter

_{t}*s*will be unreasonably large. As can be seen from Figure 14a, this occurs for type II neurons when β < 10

_{t}^{−10}. Figure 14c shows this threshold phenomenon more clearly: there are two distinct regimes in the

*s*-ρ curve. This is most obvious for the type II neuron, but it also occurs in the type I neuron: the slope of its

_{t}*s*-ρ curve is larger in the region ρ < 0.03 than in the region ρ>0.03.

_{t}But if β is too large, some coincident event pairs will no longer be matched, and those events will be treated as noncoincident events. As a result, the distribution of the offsets will have lighter tails than the gaussian distribution; the parameter *s _{t}* will then be too small and ρ unreasonably large. This occurs in both neuron types for β>0.1 (cf. Figures 14a and 14b).

From this intuitive reasoning, we expect there is an optimal value of β. This is confirmed in Figures 15 and 16, which show quantile-quantile plots of the offset distribution for various values of β. If the offset distribution were exactly gaussian, the data quantiles would lie on the straight dashed lines. One can clearly see deviations from the straight lines for small and large values of β. Figure 17 shows the average deviation from the straight line as a function of β, which is a measure for how much the offset distribution differs from a gaussian distribution. The value of β with the smallest deviations is 10^{−3} and 0.03 for type I and type II neurons, respectively, which corresponds to (*s _{t}*, ρ) = ((15.2 ms)

^{2}, 0.029) and (

*s*, ρ) = ((2.7 ms)

_{t}^{2}, 0.27), respectively. For those values of β, the data quantiles practically coincide with the straight line, and therefore, the offset distribution may be considered gaussian, and model

*p*(

*x*,

*x*′,

*b*,

*b*′, δ

_{t},

*s*, ℓ), equation 3.25, is then self-consistent.

_{t}We also applied this technique for determining β to single pairs of point processes (cf. section 6) but did not obtain satisfactory results. The method needs a sufficient number of (coincident) events in order to be reliable. Therefore, we decided to fix the parameter β in the experiments in section 6 and to optimize over it.

We assessed the estimates of (*s _{t}*, ρ) by bootstrapping (Efron & Tibshirani, 1993). More precisely, for both types of neurons, we generated 1000 sets of 50 spike trains. Those sets of spike trains were generated along the lines of the symmetic procedure of Figure 1b. First, we generate a hidden process

*v*with length ℓ = 40/(1 −

*p*) and equidistant events

_{d}*v*. Then we generate 50 noisy copies of

_{k}*v*by slightly perturbing the timing of the events

*v*(with noise variance

_{k}*s*/2) and deleting some of the events (with probability

_{t}*p*). The delay δ

_{d}_{t}was set equal to zero. We carried out this procedure for type I neurons with (

*s*, ρ) = ((15.2 ms)

_{t}^{2}, 0.029) and type II neurons with (

*s*, ρ) = ((2.7 ms)

_{t}^{2}, 0.27), which are the estimates obtained by the SES inference procedure. Next we applied the SES algorithm of Table 2 to those sets of point processes; the parameter β was set equal to 10

^{−3}and 0.03 for type I and type II neurons, respectively, and we chose the initial values δ

^{(0)}

_{t}= 0 ms and

*s*

^{(0)}

_{t}= (30 ms)

^{2}. The results of this analysis are summarized in Table 6. Since the expected values of

*s*and ρ agree very well with the true values and the normalized standard deviations are small (<15%), it is reasonable to believe that the estimates (

_{t}*s*, ρ) = ((15.2 ms)

_{t}^{2}, 0.029) and (

*s*, ρ) = ((2.7 ms)

_{t}^{2}, 0.27) for type I and type II neurons, respectively, are accurate.

Statistics . | Type I . | Type II . |
---|---|---|

E[s] _{t} | 15.3 | 2.70 |

1.8% | 1.8% | |

E[ρ] | 0.0283 | 0.273 |

12% | 3.1% |

Statistics . | Type I . | Type II . |
---|---|---|

E[s] _{t} | 15.3 | 2.70 |

1.8% | 1.8% | |

E[ρ] | 0.0283 | 0.273 |

12% | 3.1% |

Notes: The table shows the expected values E[*s _{t}*] and E[ρ], besides the normalized standard deviations and . The expected values practically coincide with the actual estimates, and the normalized standard deviations are small. Therefore, the SES estimates (

*s*, ρ) may be considered reliable.

_{t}For completeness, we show in Figure 18 a histogram of the number of iterations required for the SES algorithm of Table 2 to converge. In each of those iterations, one updates the sequences (*j*, *j*′) and the SES parameters (cf. Table 2). The histogram of Figure 18 was computed over all pairs of trials of both types of neurons and for all values of β considered in Figure 14a. From the histogram, we can see that the algorithm converged after at most 19 iterations, and on the average, after about three iterations. We allowed a maximum number of 30 iterations, and therefore, from Figure 18, we can conclude that the algorithm always converged in our experiments.

#### 7.2.2. Classical Measures.

Besides SES, we also applied the classical methods reviewed in section 5. The results are summarized in Figure 19. From the figure, it can be seen that the similarity measures *S _{S}*,

*S*, and

_{H}*S*are larger for type II neurons than for type I neurons if the time constants τ

_{Q}_{S}, τ

_{H}, and τ

_{Q}are small. For large time constants, the opposite holds. This can be explained as follows. Since the timing dispersion in type I neurons is fairly large, many spikes of type I neurons will be treated as noncoincident (nonoverlapping) if the time constants τ

_{S}, τ

_{H}, and τ

_{Q}are small. If those time constants are large, most spikes of type I neurons will be considered as coincident (overlapping). In contrast, type II neurons have high timing precision, and therefore the similarity measures

*S*,

_{S}*S*, and

_{H}*S*grow quickly with the time constants τ

_{Q}_{S}, τ

_{H}, and τ

_{Q}. However, the measures converge to relatively small values. Due to the large number of dropouts in spike trains of type II neurons, a substantial number of spikes are treated as noncoincident; therefore, as the time constants grow, the similarity measures

*S*,

_{S}*S*, and

_{H}*S*attain smaller values than in type I neurons. The results of the (normalized) Victor-Purpura distance metric and the van Rossum distance metric

_{Q}*D*can be understood along the same lines.

_{R}As we pointed out earlier, SES adjusts its timescale automatically. The same holds for event synchronization (Quian Quiroga, Kreuz, et al., 2002): one may adapt the time constant τ_{Q} according to equation 5.4. With this adaption rule for τ_{Q}, we obtained *S _{Q}* = 0.96 for type I neurons and

*S*= 0.83 for type II neurons. This can be understood as follows: since the adaptive time constant τ

_{Q}_{Q}is typically about 50 ms or larger, the value of

*S*is the lowest in type II neurons due to the frequent dropouts in their spike trains.

_{Q}At last, we consider a classical similarity measure *S*_{ISI} for multiple point processes, introduced in Tiesinga and Sejnowski (2004) (see also Tiesinga et al., 2008); it is based on interspike intervals (ISI). The first step is to merge the spike times across all trials. Next the interspike intervals of this sequence are calculated, and the coefficient of variation of the aggregated response (CVP) is calculated as the standard deviation of the interspike intervals divided by their mean. The similarity measure *S*_{ISI} is then eventually obtained by subtracting 1 from the CVP and dividing by the square root of the number of trials. We obtained *S*_{ISI} = 0.25 and *S*_{ISI} = 0.64 for type I and type II neurons, respectively. Since *S*_{ISI} captures mostly the timing precision and is less sensitive to dropouts, we indeed expect that it attains a larger value for type II neurons than for type I neurons.

### 7.3. Discussion.

This analysis underlines an important issue: most classical measures depend on a time constant, and in some practical situations, it is not obvious how to choose the optimal value of those time constants. Indeed, Figure 19 suggests that one should compute the measures for a range of values of the time constants. As a result, one obtains not just one single measure of similarity but a similarity function *S*(τ). Such a function may not always be easy to interpet, compare, or manipulate in practice. Event synchronization and SES are able to automatically determine the appropriate timescale.

However, in some applications, one may wish to investigate how the similarity depends on the timescale. In event synchronization and SES, the timescale can be fixed. Therefore, event synchronization and SES can be computed for a range of timescales.

## 8. Conclusion

We have presented an alternative method to quantify the similarity of two time series, referred to as stochastic event synchrony (SES). The first step is to extract events from both time series, resulting in two point processes. The events in those point processes are then aligned. The better the alignment, the more similar the original time series are considered to be. In this letter, we focused on one-dimensional point processes.

Obviously it is important to extract meaningful events from the given time series. The proposed method may be expected to produce useful results only if the events characterize the time series in a suitable manner. In the case of spike trains, individual spikes can naturally be considered as events. Note that for certain neurons, however, it may actually be more appropriate to define a burst of spikes as a single event.

We compared SES to classical (dis)similarity measures for one-dimensional point processes. Through the analysis of surrogate data, we observed that most classical (dis)similarity measures are unable to distinguish timing dispersion from event reliability, they depend on both quantities. In contrast, SES allows quantifying both aspects of synchrony separately. We also wish to reiterate that all (dis)similarity measures, both the classical measures and SES, typically underestimate the timing dispersion and overestimate event reliability; this is due to the ambiguous nature of the synchrony of one-dimensional point processes.

This ambiguity may be resolved by incorporating additional information about the events. For example, in the case of spikes, one may take the shape of the spikes into account. The point processes then become multidimensional. In the companion letter in this issue, we describe how SES may be extended to multidimensional point processes. In that setting, the events pairs are no longer assumed to be ordered, in contrast to the formulation of SES here (see section 3).

Finally, we address an interesting topic for future reseach. The SES parameters are determined by coordinate descent, which is guaranteed to converge to stationary points of the posterior distribution of the SES parameters. However, it does not necessarily converge to the maximum of that distribution, which corresponds to the maximum a posteriori (MAP) estimates of the SES parameters. Instead of trying to obtain the MAP estimates (by coordinate descent or other techniques), one may (approximately) compute the posterior distribution of the SES parameters by means of Monte Carlo algorithms such as Gibbs sampling or Markov chain Monte Carlo methods. From that (approximate) posterior distribution, one may be able to obtain more reliable estimates of the SES parameters. In addition, whereas the proposed approach is mostly practical when the prior for the number of events is a geometric distribution, Monte Carlo methods can easily deal with other priors, such as Poisson distributions. However, such Monte Carlo approaches would be substantially slower than the proposed algorithm based on coordinate descent.

## Appendix: Derivation of Inference Algorithm for One-Dimensional SES

Here we derive the algorithm of Table 2. More specifically, we clarify how to carry out the updates 4.2 and 4.3.

*i*+ 1: where is the th event of

*x*, and is the estimate of

*j*at iteration

*i*+ 1. Likewise is the th event of

*x*′, and

*n*

^{(i+1)}is the number of coincident pairs at iteration

*i*+ 1:

Update 4.2 can readily be carried out by applying the Viterbi algorithm (Forney, 1973) (“dynamic programming”) on a trellis with the pairs of coincident events as states, or equivalently, by applying the max-product algorithm on a cycle-free factor graph (Loeliger, 2004; Loeliger et al., 2007) of *p*(*x*, *x*′, *j*, *j*′, δ_{t}, *s _{t}*). The procedure is equivalent to dynamic time warping (Myers & Rabiner, 1981); it is, for example, used in the context of bio-informatics to compute the distance between genetic sequences (Sellers, 1974, 1979). It is also applied in neuroscience to compute various spike metrics (Victor & Purpura, 1997; Aronov, 2003; Victor et al., 2007).

As a first step in that procedure, one arranges the sequences *x* and *x*′ on the sides of a (*n* + 1) × (*n*′ + 1) grid (see Figure 5). Note that we assume, without loss of generality, that the sequences *x* and *x*′ are ordered: *x _{k}* ⩾

*x*

_{k−1}and

*x*′

_{k}⩾

*x*′

_{k−1}. An alignment corresponds to a path on the grid; in particular, the alignment 4.2 corresponds to the minimal cost path. Note that each path starts at (0,0) and ends at (

*n*,

*n*′). In addition, it never turns back; in other words, the indices

*q*and never decrease, since the event sequences are assumed to be ordered (cf. section 3). Moreover, those indices increase by at most 1 at each step along the path. As a result, each path contains three kinds of segments, , all of length 1:

_{k}Horizontal:

Vertical:

Diagonal: .

*n*+ 1) × (

*n*′ + 1) cost matrix

*M*. The first row and column of

*M*are filled with zeroes—the elements (for

*k*= 0, 1, …,

*n*and

*k*′ = 0, 1, …,

*n*′). The other elements are computed recursively as for

*k*= 1, …,

*n*and

*k*′ = 1, …,

*n*′. Obviously, in order to compute the cost , the costs and need to have been computed previously. To this end, one may first compute

*M*

_{1,1}. Then one may gradually fill the rest of the matrix

*M*. The minimal cost is eventually given by . The corresponding path and alignment may be traced back from the options chosen at each stage in the recursion (see equation A.4). The first choice corresponds to treating

*x*as a noncoincident event (; horizontal segment), the second choice corresponds to treating as a noncoincident event (; vertical segment), and the third choice corresponds to treating as an event pair ( and ; diagonal segment). Combining the updates A.1 and A.3 with the recursion A.4 leads to the algorithm of Table 2.

_{k}Note that if the event sequences are not assumed to be ordered, the paths on the grid may return and the minimal-cost path may no longer be found by this simple dynamic programming procedure.

## Acknowledgments

Results of this letter were in part reported in Dauwels et al. (2007). The authors thank Zhe (“Sage”) Chen (Harvard University), Kenji Morita (RIKEN Brain Science Institute), Yuichi Sakumura (NAIST), Carlos Rodriguez (University at Albany), Sujay Sanghavi (MIT), and Yuki Tsukada (NAIST) for helpful discussions. We are grateful to participants of the Retreat of the MIT Picower Center for Learning and Memory (May 2007), the RIKEN Symposium “Brain Activity and Information Integration” (September 2007), and the NIPS Workshop “Large-Scale Brain Dynamics” (December 2007; Whistler, Canada) for numerous inspiring questions and comments. J.D. is deeply indebted to Shun-ichi Amari (RIKEN Brain Science Institute) and Andi Loeliger (ETH Zurich) for continuing support and encouragement over the last years.

J.D. was supported in part by postdoctoral fellowships from the Japanese Society for the Promotion of Science, the King Baudouin Foundation, and the Belgian American Educational Foundation, T.W. was in part supported by NSF grant DMI-0447766. Part of this work was carried out while J.D. and T.W. were at the RIKEN Brain Science Institute, Saitama, Japan.