Current spike sorting methods focus on clustering neurons' characteristic spike waveforms. The resulting spike-sorted data are typically used to estimate how covariates of interest modulate the firing rates of neurons. However, when these covariates do modulate the firing rates, they provide information about spikes' identities, which thus far have been ignored for the purpose of spike sorting. This letter describes a novel approach to spike sorting, which incorporates both waveform information and tuning information obtained from the modulation of firing rates. Because it efficiently uses all the available information, this spike sorter yields lower spike misclassification rates than traditional automatic spike sorters. This theoretical result is verified empirically on several examples. The proposed method does not require additional assumptions; only its implementation is different. It essentially consists of performing spike sorting and tuning estimation simultaneously rather than sequentially, as is currently done. We used an expectation-maximization maximum likelihood algorithm to implement the new spike sorter. We present the general form of this algorithm and provide a detailed implementable version under the assumptions that neurons are independent and spike according to Poisson processes. Finally, we uncover a systematic flaw of spike sorting based on waveform information only.
Extracellular electrodes are commonly used to monitor populations of neurons. The signal collected at an electrode is a mixture of activities from different neurons, corrupted by noise. Spike sorting consists of finding out how many neurons contributed to the recorded data and determining which neurons produced which spikes. Many spike-sorting solutions have been proposed, all based on clustering characteristic features of the recorded spike waveforms (see Lewicki, 1998, for a review). Spike-sorting papers generally focus on two broad problems: feature selection and clustering techniques. Features can be raw waveform measurements or projections onto lower-dimensional spaces, such as principal components. Full or reduced measurements are numerical vectors that can be treated as points in a high-dimensional space. Spike sorting is achieved by identifying clusters in this cloud of points, which correspond to single neurons. Clustering techniques are many and range from ad hoc procedures (e.g., Fee, Mitra, & Kleinfeld, 1996), nonparametric techniques (e.g., k-means; Salganicoff, Sarna, Sax, & Gerstein, 1988), neural networks (Ohberg, Johansson, Bergenheim, Pedersen, & Djupsjobacka, 1996), to model-based clustering using mixtures of distributions (e.g., Lewicki, 1994; Sahani, Pezaris, & Andersen, 1997; Shoham, Fellows, & Normann, 2003). In this letter, we focus on the latter approach, often referred to as automatic spike sorting, because it requires less subjective judgment to classify spikes, provides a statistically complete and efficient solution to the spike clustering problem, and is well suited to process many recording electrodes in a short amount of time. This approach has been evaluated in an experiment and validated with simultaneous intracellular and extracellular recordings, and it appears to perform well in many situations (Harris, Henze, Csicsvari, Hirase, & Buzsaki, 2000).
Spike sorting is typically the preliminary step to understanding how neurons represent information about covariates such as stimuli in sensory studies, behavioral correlates in motor studies, or, more generally, environmental parameters. This often involves estimating firing rates, tuning curves or functions, receptive fields, and so on as functions of these covariates, using the spike-sorted data. However, when these covariates modulate the neurons' firing rates, they too provide information about spike identities, which thus far has been ignored for the purpose of spike sorting. To see this, imagine an electrode that records two neurons whose tuning curves are modulated by some covariate c. Imagine that the first neuron spikes only when c1 < c < c2 and that the second neuron spikes only when c2 < c < c3, so they never spike together. Traditional spike sorters will assign a spike recorded at the electrode to one of the two neurons based on features of the recorded waveform, which might lead to a misclassification error if the neurons' waveform clusters overlap. But we can classify spikes with perfect confidence if we use c for spike sorting. Indeed, if we observe that c1 < c < c2 when a spike is detected at the electrode, then the spike must have been produced by neuron 1. If c2 < c < c3, then it is neuron 2 that must have spiked.
We propose a new automatic spike sorter that combines spike waveform information and tuning information. Because it uses more information, it yields lower spike misclassification rates than traditional automatic spike sorters do. The proposed method does not require additional assumptions; only its implementation is different. It essentially consists of performing spike-sorting and tuning estimation simultaneously rather than sequentially, as is currently done. We present the general form of the EM algorithm we developed to estimate the new spike sorter and provide a detailed implementable version under the assumptions that neurons are independent and spike according to Poisson processes. We illustrate the new sorter and its properties based on several simulated experiments.
Consider a single electrode that records I neurons. We collect the electrode spike train (EST) by thresholding its bandpassed voltage signal. When a spike is detected at time t, we record waveform measurements, which we denote by at. The methods presented in this letter apply generally, so at can be the raw waveform or any reduction, such as principal components (PCs). We also record the values ct of covariates thought to modulate neurons that are spiking. For example, in section 3.1, ct = dt is the angular direction of a moving hand in a movement decoding experiment, while in section 3.3, ct = t is the experimental time of a designed experiment. Once the electrode spike train is recorded, we divide time in bins of length δ that are small enough so that at most one spike can occur in a bin. Our illustrations all use δ = 1 msec. Then our observed data are a spike train z = (zt, t = 1, …, T), where zt = 1 (0) indicates that the electrode voltage did (did not) exceed the threshold at time t, the recorded waveform measurements a = (at, t = 1, …, T)1, and the covariates c = (ct, t = 1, …, T).
The EST z is the aggregate of the I neurons' spike trains yi = (yit, t = 1, …, T), i = 1, …, I. The yi's are not observed directly, and it is the purpose of spike sorting to estimate them by assigning spikes to neurons based on classifying the waveform measurements at. To develop our algorithm, it is useful to associate with a spike at t, the unobserved I-dimensional binary latent vector xt = (x1t, …xIt), where xit = 1 (0) means that neuron i did (did not) spike at t. When zt = 0 (no spike was recorded at t), xt is a vector of zeros (no neuron spiked). When zt = 1, all we know is that xt is not identically zero, and we let denote the set of (2I − 1) distinct values xt can take, which give all possible subsets of the I neurons spiking approximately together to produce a spike at t. For example, if a spike is detected at an electrode that records I = 2 neurons, the combinations of neurons that could have produced that spike can take 22 − 1 = 3 values, x = (1, 0) ≡ 10, (0, 1) ≡ 01 or (1, 1) ≡ 11, depending on whether neuron 1 or 2 spiked alone or together. The complete set of latent variables is x = (xt, t = 1, …, T). In statistical jargon, (z, x) is a latent marked point process with x as the unobserved marking variable.
2.1. Classic Automatic Spike Sorting Based on Waveform Information.
The result of spike sorting is rarely perfect. In the ideal case, when the variabilities of each neuron's WFs are smaller than the variability between WFs from different neurons, WFs will form distinct clusters and neurons can be isolated perfectly. Then x†i = yi, the true spike train of neuron i. More often, clusters overlap so that the WFs that belong to overlapping regions cannot be classified with perfect confidence. Spike misclassification errors are unavoidable. However, if equation 2.1 is indeed the distribution of the data, the Bayes classification rule in equation 2.2 is known to be optimal, which means that it has the lowest spike misclassification rate of all spike-sorting rules (see e.g., Wasserman, 2004). But as we show in the next section, equation 2.1 does not use all the information in the data, so that the traditional automatic spike sorter is not fully efficient.
2.2. Incorporating Tuning Information for Spike Sorting.
As with traditional spike sorters, perfect spike classification is rare. However, the proposed spike sorter incorporates all the information available about spike identities, and it is derived from the optimal Bayes' rule in equation 2.7, so it has the lowest spike misclassification rate of all classification rules. In particular, it has a lower spike misclassification rate than the traditional automatic spike sorter described in section 2.1.
2.2.1. Special Case: Classic and Proposed Approaches Are Equivalent When Waveform Features Clusters Are Perfectly Separated.
When WFs clusters are perfectly separated, a spike with WF a has f*x(a) = 0 for all , except for the particular x that produced the spike. Hence equations 2.2 and 2.7 both equal one if the spike was produced by neuron combination x, and zero otherwise. This shows that the new spike-sorting rule does not depend on the covariate c, so it reduces to the traditional spike-sorting rule. Tuning function estimates will thus be the same under both approaches, given the same tuning model.
2.2.2. Automatic Spike Sorting Based on Tuning Information Only.
2.3. Estimation of the Waveform-Based Spike-Sorting Rule.
So far we have used the true proportions π*x, distributions f*x, and tuning curves λ*i. In practice, these are unknown and must be estimated from data. We approximate f*x(a) by fx(a; ψx) indexed by parameters ψx, for example, normal distributions with means and variance-covariance matrices ψx = (μx, Σx). We approximate λ*i(c) by functions λi(c; θi) that depend on parameters θi. Although the f*s and λ*s play parallel roles for spike sorting, WFs are qualitatively invariant to the experiment that produce them, and models for their distributions have long been evaluated. In contrast, neurons' tuning properties are not only experiment specific, they are typically of paramount interest in practice. Therefore, while it is reasonable to use parametric models for fx, λi should be nonparametric functions unless more information is available. (This point is discussed further in section 3.) We denote by Θ = (θi, i = 1, …I), , and the combined vectors of parameters. We estimate Θ, Π, and Ψ by the method of maximum likelihood (ML), because it makes the most efficient use of data (Kass, Ventura, & Brown, 2005).
(M-step): Maximizing Q(Π, Ψ, Π(k), Ψ(k)) with respect to (Π, Ψ) to obtain update parameter values (Π(k+1), Ψ(k+1))
Mixtures of normal distributions are often used for f(a; Π, Ψ), although Harris et al. (2000) and Shoham et al. (2003) provide convincing evidence that t-distributions might be are more appropriate. We use univariate normal distributions throughout this letter for algorithmic simplicity. It would be straightforward to use instead t or other distributions deemed more appropriate, provided an EM algorithm existed or could be developed to estimate their parameters. For a mixture of univariate normal distributions, the general EM algorithm given above reduces to:
Pick starting values (Π(0), Ψ(0)). Repeat steps 1 and 2 until convergence.
- (E-step): For each recorded WF at, with associated latent variable Xt, calculate the responsibilities of at to each of the mixture components, 2.2 at (Π, Ψ) = (Π(k), Ψ(k)).
References for the normal mixture EM are numerous (see e.g., Hastie, Tibshirani, & Friedman, 2001, sec. 8.5). The case of a mixture of t-distributions was treated by Shoham et al. (2003) in the context of spike sorting. Figures 2A and 8A show step-by-step runs of this algorithm applied to two simulated data sets.
2.4. Estimation of the Waveform and Tuning-Based Spike-Sorting Rule.
(M-step): Maximizing Q(Θ, Ψ, Θ(k), Ψ(k)) with respect to Θ and Ψ to obtain parameter updates Θ(k+1) and Ψ(k+1).
This algorithm is general and has the same form as the algorithm given in the previous section.
We now have all the elements to run the EM algorithm to find the ML estimate of (Ψ, Θ), which in turn provides estimates of f*(a ∣ c) in equation 2.6 and of the neurons' tuning curves λ*i(c). In the particular case when f*(a ∣ c) is assumed to be a mixture of univariate normal distributions, this algorithm to reduces to:
Pick starting values Ψ(0) and Θ(0). Repeat steps 1 and 2 until convergence.
- (E-step): For each WF at, with associated latent variable Xt, calculate the responsibilities of at to each of the WF mixture components, 2.7 at (Θ(k), Ψ(k)).
- (E-step): For each spike zt = 1, with associated latent variable Xt, calculate the responsibilities of zt to each of the I neurons, 2.7, and summation over the 2I−1 values of x = (x1, …, xI) that have xi = 1. Given zt = 0, we have trivially
(M-step): For i = 1, …, I, regress eit on ct, t = 1, …, T to obtain θ(k+1)i, the parameter of the tuning curve λi(c, θi).
To distinguish this algorithm from the EM algorithm of the previous section, we refer to it as the linked EM algorithm, since it merges two partial EM algorithms. Note that steps 1 and 2 can be executed in any order. The particular form of step 1 assumes that waveforms are normally distributed. Distributions other than normal can be used provided an EM algorithm exists to estimate their parameters. Step 2 assumes that spike trains are Poisson. Alternatives are discussed in section 4.
2.4.1. Special Case: Perfectly Separated Waveform Features Clusters.
As mentioned earlier, traditional and proposed approaches yield the same spike-sorted data in theory when WFs clusters are separated. This remains true in practice, provided the WFs' distributions fx are not too badly misspecified. Indeed in that case, after convergence of the EM algorithms of sections 2.3 and 2.4, the responsibility of WF at to each of the mixture components is wxt = 1 if at was produced by neuron combination x, and wxt = 0 otherwise. That is, both spike-sorting rules correctly classify all spikes. The linked EM algorithm also provides estimates of the tuning functions, which are the same estimates we would obtain with the traditional approach. Indeed, at convergence, the responsibilities of zt to each of the I neurons are eit = 1 if the spike at t was produced either by neuron i alone or jointly with other neurons and eit = 0 otherwise. Hence the M-step 2b consists of the regressions of the true neurons' spike trains on c.
2.4.2. Spike Sorting Based on Only Tuning Information.
To spike-sort the data using only tuning information, all we need is the ML estimate of Θ, which in turn gives an estimate of the spike classification rule in equation 2.8. To do that, we use the linked EM algorithm, but we set the initial values ψ(0)x to the same constant for all , we let Ψ(k) = Ψ(0) for all iterations k, and we run only step 2 of the algorithm. The linked EM algorithm effectively reduces to the algorithm of Ventura (2008). Figures 3 and 11B provide illustrations.
2.5. A Clutter Cluster Serves to Collect Noise and Outlying Spikes.
For automatic spike sorting, WFs at are typically assumed to originate from one of several components, which implicitly correspond to different neurons. To explicitly allow for neurons spiking together, we instead assumed that at originates from one of card components, each corresponding to a combination of neurons spiking approximately together to produce a spike. This choice is important to identify not only joint spikes but also individual neurons' spikes in the low-SNR case, when spikes are contaminated by noise, as we illustrate in section 3.3. But this choice also means that the numbers of probabilities π*x in equation 2.1 and distributions f*x(a) in equations 2.1 and 2.6 increase exponentially with I. Some of these quantities cannot be estimated accurately with a finite amount of data. For example, suppose that an electrode records I = 4 neurons, all firing with a homogeneous Poisson rate of 100 Hz, and that a single, substantially different, waveform is recorded whenever two spike peaks are separated by less than γ = 2 msec. Then 1 second of data would typically contain 51 clean spikes from each of the I = 4 neurons, 13 overlaps from each of the pairs of neurons, 3 overlaps from each of the triplets of neurons, and no occurrence of the 4 neurons spiking approximately together. While this provides enough data to estimate the π*x and f*x(a) that correspond to single neuron spikes, the estimates of these components for such that ∑ixi ⩾ 2 would be at best too variable to be useful.
Because equation 2.16 explicitly models through πx(c; Θ) the probabilities of several neurons spiking together, joint spikes' clusters should be more easily identified. Hence, it seems likely that the chosen model will be such that , and that the components in will correspond to clusters of joint spikes. This is illustrated in section 3.3.
2.6. Implementation Details.
As with all EM algorithms, the issues of choosing initial values, determining the number of neurons, and avoiding convergence to local likelihood maxima must be addressed. These issues have been the subject of extensive research in the statistics community. (For recent reviews, see Sahani, 1999; McLachlan & Peel, 2000; Figueiredo & Jain, 2002.) Although practically important, these issues are not specific to the EM algorithm developed here, so we kept their implementation to a minimum to avoid cluttering the main ideas. Basically, we chose reasonable initial values and determined the number of neurons using a penalized likelihood approach, as described below. We did not implement any fancy parameter space search to avoid local likelihood maxima. However, we give evidence in section 3.6 that including tuning information for spike sorting helps determine the number of neurons and facilitates the convergence of the algorithm to the global maximum.
Throughout this letter, we assume without loss of generality that the waveforms' first PC distributions , are univariate normal distributions with means and variances (μx, σ2x). To fit I neurons to an electrode, we choose initial values μ(0)x and σ(0)x as follows. For the I combinations corresponding to single neurons (∑ixi = 1), we simulate μ(0)x at random between the th and the th sample quantiles of at. For example, when I = 2, we simulate μ(0)01 at random between the 5th and 45th quantiles of at and μ(0)10 between the 55th and 95th quantiles. This guarantees that the μ(0)x are random but well spread out. We simulate σ(0)x at random between S/(I + 2) and S/I, where S is the sample standard deviation of at, so that the initial PC distributions do not overlap much. For that correspond to joint spikes, we let μ(0)x be the sample mean of at and simulate σ(0)x at random between 90S and 100S. This is akin to using very spread uniform distributions and reflects our lack of knowledge about where joint spike clusters might be. We then simulate the π(0)x that correspond to single neurons at random between 30% and 70%, and take the π(0)x that correspond to joint spikes to be the product of the corresponding single neuron probabilities. Finally we standardize the π(0)x so they sum to one. As for initial values for λi to run the linked EM algorithm, we take them to be equal to the constants π(0)x for the I values of x such that ∑ixi = 1.
But the new spike sorter seems to put the cart before the horse: How can tuning functions be used before the data are spike-sorted? The answer lies with the linked EM algorithm (see section 2.4) used to estimate equation 3.2: given the spike times and waveform measurements, the algorithm is designed to provide estimates of the neurons' tuning functions, just as the traditional EM algorithm (see section 2.3) is designed to provide estimates of the waveform distributions fx in equation 3.1. Figures 2, 8, and 12 illustrate this and also illustrate that the proposed method reduces spike misclassification rates, helps identify joint spikes, and improves the convergence of the EM algorithm.
To avoid the inherent uncertainty in determining the true spike identities in extracellular recordings, we use simulated data, and to produce easily readable graphics, we use electrodes that record only two neurons. Although in real neural data, there is typically significant information for spike sorting in up to three or four of the PCs, we use only one PC to facilitate the visual comparison between spike sorters. This choice implies no loss of generality. Indeed, both spike sorters use the same information from waveforms, so it is fair to compare their performances. In practice more PCs are used to reduce WF overlap. Section 3.5 illustrates how the spike misclassification rates vary with cluster overlap. Finally, we assume that waveforms' PCs are normally distributed and that neurons spike according to inhomogeneous Poisson processes.
3.1. Motor Cortex Example.
This example is inspired by movement decoding experiments in primary motor cortex (M1). Say that I = 2 simulated M1 neurons are recorded by the same electrode while a monkey traces a 2D constant-speed circular trajectory with his hand. The trajectory can be described by a scalar parameter , which measures the angle of the hand position (x, y). The true tuning curves λ*i(d) are shown in Figure 1B. By true, we refer to the unknown mechanism that generates the spike trains rather than to the model we chose to fit to them. To produce easily readable graphics, we used unrealistically highly modulated tuning curves, with minimum and maximum firing rates approximately 0 and 110 Hz, respectively. Our other examples are realistic. We found that the tuning curves and their estimates appeared visually less cluttered when plotted in a circular coordinates, as displayed in Figure 1B.
With I = 2 neurons, card types of spikes can be recorded at the electrode, which are the spikes generated by neurons 1 and 2 alone, and their joint spikes. They are coded by x = (1, 0) ≡ 10, x = (0, 1) ≡ 01, and x = (1, 1) ≡ 11, respectively. We assume that the signal-to-noise ratio is large enough so that spikes crossing threshold are real spikes rather than noise. The noise case is treated in section 3.3. Without loss of generality, we use the waveforms' first PCs (PC1) for spike sorting and assume that their true distributions f*x are normal with means and variances (6, 1), (8, 1), and (10.5, 3) for x = 10, 01, and 11, respectively. These parameters are arbitrary but could be made to match actual data by shifting and rescaling the normal distributions without qualitatively changing the results. Given the tuning curves specified above, the true probabilities of neurons spiking alone and together are π*10 = π*01 = 49.5% and π*11 = 1%. Figure 1A displays π*xf*x for . Because π*11 is close to zero, π*11f*11 is hard to distinguish from the x-axis. The waveform clusters π*xf*x overlap, so spikes cannot be sorted with perfect confidence.
Using the model in Figures 1A and 1B, we simulated the electrode spike train during 50 loops of the circular trajectory, letting the neurons spike according to Poisson processes with rates λ*i(d). A color-coded histogram of the spikes' PCs is in Figure 1C. In practice, the spike identities are unknown, and it is the purpose of spike sorting to determine them. The traditional procedure consists of clustering the data that make the histogram in Figure 1C. The proposed spike sorter also uses tuning information, which we represented by the peristimulus histogram of the spikes in Figure 1D. This plot shows that spiking happens primarily when directional tuning is approximately between −π/4 and π/2 + π/4, although, just as with the histogram of PC1 in Figure 1C, it is hard to distinguish individual neurons with the naked eye.
Before we proceed with spike sorting, we must select models to fit to the data. To ensure that the results of this illustrative example are not corrupted by possible effects of model inadequacies, we use the correct family of models: , are taken to be normal distributions with means and variances (μx, σ2x) and the tuning curves to be of the form λi(d; θi) = exp(θ0i + θ1icos(d) + θ2isin(d)), i = 1, …, I. The parameters and Θ = (θi, i = 1, …, I) are estimated using the EM algorithms in sections 2.3 and 2.4, with initial values generated according to section 2.6. The number of neurons recorded by the electrode was determined by penalized likelihood: both AIC and BIC found I = 2 neurons, the correct number. Plots from left to right in Figure 2A show π(k)xfx(a; ψ(k)x), k = 0, …, 5, the initial values and first five iterations of the EM algorithm of section 2.3, and the last panel shows , the solution after convergence, for x = 01 and x = 10. We omitted estimates of π11f11(a; ψ11) because they cannot be distinguished from the horizontal axis. Figure 2B shows a run of the linked EM algorithm. The same initial values were used in Figures 2A and 2B. The successive plots show [∫πx(c; Θ(k))dc]fx(a; ψ(k)x) for x = 01 and x = 10, and λ(d; θ(k)i), i = 1, 2, for k = 0, …, 5, as well as the solution after convergence.3 This shows clearly that the linked EM algorithm can indeed estimate the tuning curves before the data are spike-sorted.
With Π, Ψ, and Θ estimated, we apply the classification rules in equations 2.2, 2.7, and 2.8 to sort the electrode spike train. Figure 3 shows the outcome for one trial of the simulated data. From inward to outward, the plot shows the neurons' true tuning curves λ*i(d); the observed electrode spike train for the selected trial, z; the true spike train of neuron 1 we aim to retrieve by spike sorting, y1; and the spike-sorting classification obtained by using spike waveform information only (see equation 2.2), tuning information only (see equation 2.8), and both sources of information (see equation 2.7). We see that for waveform-based spike sorting, misclassification errors are more numerous in the neurons' preferred directions. This happens because when d is close to the preferred direction of neuron 2 (1), almost all spikes recorded at the electrode belong to neuron 2 (1), yet spike sorting classifies them according to waveform information only. On the other hand, tuning-based spike sorting assigns spikes to neurons only when their firing rates are comparatively larger than the firing rates of the other neurons. Although all spike-sorting rules produce errors, combining all sources of information clearly does best. The resulting spike misclassification rates for traditional and proposed approaches are 18% and 9%, respectively, on average across many repeat simulations. Figure 4 shows the same spike identity color-coded histogram as Figure 1C, with misclassified spikes indicated in black.
The proposed spike sorter also helped identify joint spikes: traditional spike sorting (see equation 2.2) retrieved 37% of the actual joint spikes, while the proposed method retrieved 45%. Also for both methods, 90% of all identified joint spikes were actual joint spikes. The noisy channel example of section 3.3 further illustrates the benefit of modeling the probabilities of joint spiking as functions of the firing rates.
3.2. Model Selection for Tuning Curves.
The previous example was designed to illustrate the properties of the new spike sorter with easily interpretable graphics, but that example was unrealistic in several ways. First, M1 neurons are rarely so strongly modulated or exactly cosine tuned, but worse was to assume parametric models for their tuning curves to run the linked EM algorithm. Use of parametric models for WFs is justifiable: normal or t-distributions have long been used successfully, their relative advantages and shortcomings are well documented, and WFs' properties are similar in all experiments. On the other hand, tuning properties are experiment specific and often of primary interest. This example shows how to build nonparametric tuning curve models to run the linked EM algorithm.
Again we consider I = 2 simulated M1 neurons, recorded by the same electrode, while a monkey traces a 2D constant-speed circular trajectory. We use the same WF distributions as in the previous example, but we now assume that the true tuning curves λ*i(d) are double-peaked. They are shown as thick gray curves in Figures 5 and 6. Given these choices, the probabilities that neurons spike alone and together are π*10 = 32%, π*01 = 66%, and π*11 = 2%. We simulated the electrode spike train from this model during 50 loops of the circular trajectory, letting the neurons spike according to Poisson processes with rates λ*i(d).
In practice, neurons may or may not be tuned to particular covariates, or different neurons might be tuned to different covariates. Here we considered the covariates cos(d) and sin(d), which makes sense given the application, and also experimental time t, to capture potential temporal drift effects. To determine which covariates modulate the firing rates, we run the linked EM algorithm and test the statistical significance of each covariate using likelihood ratio (LR) tests (Kass et al., 2005) or model selection criteria like AIC or BIC. That is, we test which components of Θ are significantly different from zero. An LR test thus determined that experimental time did not significantly modulate neurons' spiking (p-value ≪0.001), so we removed t from the model. Although this conclusion is obvious here since we know how the data were generated, the process illustrates how to select appropriate covariates in real applications.
The degrees of smoothness of the gaussian filters were determined from data by cross-validation, a model choice criterion asymptotically equivalent to AIC. Within the family of nonparametric models specified by equation 3.3, the optimal model for neuron 1 (solid curve in Figure 5) uses 12 DFs—1 for the intercept and 5.5 for each of cos(d) and sin(d). The optimal model for neuron 2 (dashed curve in Figure 5) uses 1 DF for the intercept and 4.75 for each of cos(d) and sin(d). The DFs for cos(d) and sin(d) are much larger than one, which strongly suggests that the neurons are not cosine tuned.4 An LR test comparing cosine and optimal models further confirmed this, as did the K-S goodness-of-fit test of Brown, Barbieri, Ventura, Kass, and Frank (2002) (not shown). In contrast, equation 3.3 fitted to the cosine neurons of section 3.1 yielded DFs close to one, while LR and K-S tests did not reject cosine tuning curves as viable models for those neurons. This shows that common likelihood-based model selection techniques can be applied within the proposed approach to build appropriate tuning curve models.
Note that adding tuning information reduced the spike misclassification rate for neuron 1 from 30.5% to 16.8% and increased that for neuron 2 from 8.4% to 9% on average across simulated samples. Accounting for the two neurons producing 32% and 66% of the spikes, respectively, the overall spike misclassification rate was reduced from 15.5% to 11.5%
Finally, to illustrate the danger of using an inappropriate tuning model, we also fitted exponential cosine curves to the data of Figure 5. The misclassification rate for neuron 1 was reduced from 30.5% to 14%, that for neuron 2 increased from 8.4% to 13.6%, and the overall rate was still reduced from 15.5% to 13.8%. However, the improved rate does not mitigate the severity of the bias in the tuning curve estimates (see Figure 6A). To be fair to our method, we also fitted exponential cosine tuning curves to the true neurons' spike trains. The resulting estimates in Figure 6B are also grossly biased. This shows that assuming an inappropriate tuning curve model has similar consequences regardless of the estimation method.
3.3. Noisy Channel Example.
In classic single-electrode electrophysiology, waveforms are often well isolated by hand-positioning electrodes. In that case, classic and proposed spike sorting are equivalent. However, the traditional practice of optimizing the signal-to-noise ratio by manipulating the electrode placement is no longer always possible or practical, for example, when electrodes or arrays are chronically implanted. In that case, many voltage measurements exceeding the threshold will be noise or artifacts, and a significant proportion of spikes will be corrupted by noise. Our method can handle such situations. All we have to do is assume that one of the I neurons recorded by the electrode is a “noise neuron” whose firing rate is constant, and to which we assign the noise spikes. Additionally, as argued in section 2.5, equation 3.2 explicitly models through π*x(c) the probabilities of several neurons' spiking together as functions of the neurons' firing rates, which will facilitate the identification of noise-corrupted spikes.
Consider the simulated example inspired by Olson, Gettner, Ventura, Carta, and Kass (2000). The experiment that produced the data analyzed there examined the temporal evolution of neuron firing in a part of the brain affected by attention, the supplementary eye field (SEF). A simplified version of the experiment is as follows:
A central target appears on a screen at time t = 0. A monkey must maintain fixation on the target.
A cue appears at a peripheral target and gets turned off.
At t = 200, the central target is turned off, which is the signal to move the eyes to the peripheral target.
One objective of the study was to draw inferences about the temporal evolution of neurons' firing rates. In that case, the covariate that modulates the firing rates is ct = t, the experimental time. Figure 7B shows the firing rate of a neuron typical of those we estimated for these data. We use this rate as the true firing rate λ*1(t) of a neuron in the simulation below. Suppose that the noise on that electrode is normally distributed with mean 0 and variance 2. We set the threshold at 1, so that a spike is recorded each time the electrode voltage exceeds 1. A noise signal with the stated characteristics, sampled every millisecond and thresholded at 1, corresponds to a constant noise spiking rate of λ*2(t) = 308 Hertz.5 Once again, we reduce recorded waveforms to their first PCs and take their true distributions f*01 to be normal with mean 8 and standard deviation 1. We also assume that the first PCs of noise measurements are normally distributed, with mean 0 and standard deviation 5. Given the firing rate of the neuron in Figure 7B and the noise specification, the probability that the voltage-crossing threshold corresponds to noise is π*10 = 87.1%. The remaining crossings correspond to actual spikes in proportion π*01 = 8.9%, or to spikes corrupted by noise in proportion π*11 = 4%. That is, about 30% of the neuron's spikes are corrupted by noise. The PCs of the noise-corrupted spikes are unlikely to have distribution f*01, so we assumed that f*11 was normal with mean 4 and standard deviation 2. Figure 7A displays π*xf*x for .
Using the model in Figure 1, we simulated the noisy electrode spike train for 20 repeats of the experiment, letting the neuron and the noise spike according to Poisson processes with rates λ*i(d), i = 1, 2. For spike sorting, we assumed normal distributions for the PCs, we assumed that the noise was produced by a “noise neuron” with unknown constant spiking rate, and, with no obvious choice of firing rate model, we used regression splines in t with six knots placed at equally spaced quantiles of the data, as in Ventura, Carta, Kass, Gettner, and Olson (2002). Another reasonable nonparametric model that could be fitted to these data is a PSTH, as is done in Ventura (2009).
Figure 8 shows step-by-step runs of the traditional and linked EM algorithms to estimate equation 2.2 and equation 2.7, respectively. The same initial values were used in Figures 8A and 8B. Once again, Figure 8B shows clearly that the linked EM algorithm can estimate firing rates before the data are spike-sorted. Figure 8 also illustrates the benefit of modeling the probability of joint spikes π11(c; Θ) as a function of the firing rates, as per equation 2.4. Indeed, given the same initial values, the traditional EM algorithm fails to detect the cluster of noise-corrupted spikes, while the linked EM algorithm identifies the three clusters well. For traditional spike sorting, the estimate of the proportion π*11 of noise-corrupted spikes is less than 10% of its true value. In contrast, the proposed procedure has for all c, and , so it accounted for noise-corrupted spikes in the correct proportion. This facilitated the estimation of f11(t; ψ11) and, in turn, the estimation of the other WF distributions. We discuss convergence properties of the EM algorithms further in section 3.6.
With Π, Ψ, and Θ estimated, we apply traditional and proposed spike sorters to the electrode spike train. The noise cluster overlaps almost completely the spike WF clusters, which results in a very high spike misclassification rate of 51%. The proposed approach reduces that rate by only 1.5%. Note, however, that a modest reduction in misclassification rates can have substantial effects on tuning curve estimates, which in turn might have an impact on scientific conclusions, as we illustrate in the next section and return to in section 4.
3.4. Designed Experiment Example.
Consider a common type of experiment, whose aim is to compare the firing rates of neurons under two or more experimental conditions. For example, Olson et al. (2000) compared the responses of supplementary eye field (SEF) neurons to eye saccades made in response to endogenous and exogenous cues. Once again we consider an electrode that records I = 2 simulated neurons, whose waveform first PCs are as in Figure 1A. We consider two experimental conditions c = 1 and c = 2 and assume that the neurons' true firing rates are constant within each experimental condition; neuron 1 spikes at λ*1(c) = 50 Hz regardless of the condition, while neuron 2 does not spike when c = 1 and spikes at 100 Hz when c = 2. Neuron 1 is not tuned to c, which we should be able to verify from data.
We simulated the electrode spike train for 10 seconds in each condition and ran traditional and proposed spike sorters, assuming normal models for the spikes' first PC distributions, and a within-condition constant rate model for λi(c; θi). Although firing rates are constant within conditions, tuning information is still available for spike sorting because the rates are not all constant between conditions. Indeed, using tuning information reduced the spike misclassification rate from 18% to 11%.
Figure 9 shows true and estimated firing rates, obtained by regression from spike-sorted data in the traditional approach, and as the output of the linked EM algorithm in the proposed approach. The latter estimates are equal to the true rates within error. We also performed LR tests6 to compare firing rates across conditions and found that neuron 2 had significantly different rates (p-value ≪0.0001), whereas neuron 1 did not (p-value =0.84%). This is consistent with the truth. This example illustrates once more that the new spike sorter not only can estimate the neurons tuning functions but also can identify the covariates that modulate these functions.
What is troubling is that the traditional procedure yields firing rate estimates that are not equal to the true rates within error (see Figure 9B). Additionally, t-tests determined that both neurons had significantly different rates in the two conditions (p-values P ≪ 0.0001). This example is discussed further in section 4.
3.5. Spike Misclassification Rates Depend on the Amount of Waveform and Tuning Information.
The proposed spike sorter yields a lower spike misclassification rate because it uses more information. In practice, the reduction in rate will be more or less substantial depending on how much information tuning carries about spikes' identities. The only situation the proposed method cannot improve in theory is when neurons are not tuned. Then π*x(c) = π*x, and new and traditional spike sorters are equivalent. If all neurons have the same tuning curves, tuning does not provide information to identify single-neuron spikes, but joint spikes should still be more easily identified.
We revisit the motor cortex example of section 3.1. The WF distributions of the two neurons overlapped by 19%, so spikes' classification errors were unavoidable. Their tuning curves overlapped by 21% and thus carried substantial information about spike identities, which reduced the spike misclassification rate from 18% to 9%. We now repeat the same spike-sorting exercise, letting the overlaps of the tuning curves and the WF clusters vary. The precise definition of overlap is given in the appendix. Although overlap percentage is only a crude measure of curve similarity, the idea is that the more two curves overlap, the less information they provide to identify neurons. An overlap of 0% means that two curves (i.e., two WF distributions or two tuning curves) share no overlap. An overlap of 100% means that two curves are identical so they provide no information to separate neurons.
Figure 10C plots the spike misclassification rates for traditional and proposed spike sorters as functions of the overlap percentage between the WF distributions f*01 and f*10. The solid bold curve corresponds to traditional spike sorting and the lettered curves to the proposed approach for the five tuning curve overlaps shown in Figure 10B. (Note that circular coordinates do not visually represent overlap percentages well.) The misclassification rates reported for Figure 4 can be read from the bold and c-curves in Figure 10C, at value 19% on the x-axis. The values of the lettered curves when the WF clusters overlap is 100% give the spike misclassification rate when only tuning information is used for spike sorting (see equation 2.8). If the tuning curves share no overlap (curve e), spikes can be classified perfectly whatever the information in waveforms. It is clear from Figure 10C that incorporating tuning information always helps classify spikes—if, that is, the tuning curve model is not too badly misspecified (see section 3.2). Given that traditional and proposed approaches share the same set of assumptions, this help comes at the very modest computational cost of running the linked EM algorithm in place of the traditional EM algorithm.
Note that the misclassification rates in Figure 10C did not account for the classification of the joint spikes, because it was not easy to find a meaningful definition of the overlap between the three distributions f*x we could use for the x-axis. However the proposed spike sorter helped retrieve about 20% more joint spikes than the traditional sorter for all configurations of tuning curves shown in Figure 10B. This remains true when the two neurons have equal tuning curves (not shown).
We also assumed that the electrode recorded I = 2 neurons rather than estimate this number by penalized likelihood. In practice, I needs to be determined from the data, with outcome matching the truth or not depending on the amount of waveform and tuning information. This is discussed in the next section.
3.6. Combining Waveform and Tuning Information Helps the EM Algorithm to Converge.
With all EM algorithms, the issues of choosing initial values, determining the number of neurons, and avoiding convergence to local likelihood maxima must be addressed. Although practically important, these issues are not specific to the EM algorithms used here, so we kept their development to a minimum to avoid cluttering the main ideas. However, we next provide evidence that including tuning information for spike sorting helps determine the number of neurons and facilitates the convergence of the algorithm to the global maximum likelihood.
Consider again the simulation in Figure 10, for which we had assumed I = 2. We now determine I by penalized likelihood. Consider first the traditional approach, which uses WF information only (solid bold curve in Figure 10). When the WF distributions f*01 and f*10 do not overlap, penalized likelihood consistently finds I = 2 neurons in every data set simulated from the model. When f*01 and f*10 overlap completely, that number drops to I = 1 in every simulated data set. And as expected, the chance of finding two neurons decreases as the overlap increases. Adding tuning information makes it more likely to determine the correct number of neurons. For example, if neurons have tuning curves as in Figure 10 Bcde, penalized likelihood correctly finds I = 2 neurons in every simulated data set, regardless of the amount of WF information. This includes the case where WF distributions overlap completely. Tuning curves such as those in Figure 10 Bab also increase the chance of finding the correct number of neurons, although the chance of finding only one neuron is still substantial when WF distributions overlap substantially.
Next, Figure 11 illustrates that combining waveform and tuning information helps the EM algorithm converge to the global maximum likelihood. Figure 11A shows π*xf*x(a) and their estimates obtained by the traditional EM algorithm applied to a data set simulated from the model in Figure 1. True and estimated curves do not match perfectly, which is due partly to data variability (estimates match the truth only up to random error), but could also be due to the EM algorithm converging to a local maximum. Figure 11C shows the outcome of the linked EM algorithm applied to the same data set and initialized with the same values. True and estimated WF distributions now match more closely, which confirms that the algorithm in Figure 11A did converge to a local maximum. For completeness, we also applied the second part of the linked EM algorithm to the same data set, with the aim of spike-sorting the data based on tuning information only. The resulting tuning curve estimates are in Figure 11B. The large discrepancy between true and estimated curves in comparison to Figure 11C suggests that the algorithm converged to a local rather to the global maximum.
The lower panels of Figure 11 show the true curves and simulation bands in which 95% of estimates from 100 simulated data sets fall. The bands in Figure 11C remain unchanged if we let the WF distributions f*01 and f*10 spread apart so they no longer overlap (not shown), which suggests that the variability we see in Figure 11C is due to the randomness of the data rather than to convergence difficulties of the linked EM algorithm. By comparison, the fatter bands in Figures 11A and 11B must therefore be partly due to convergence problems.
Figure 11 suggests that the effect of adding tuning information to the waveform information, or vice versa, facilitates the joint estimation of WF distributions and tuning functions, by which we mean that Ψ and Θ are more likely to be estimated by their maximum likelihood estimates rather than by some values at a local maximum of the likelihood. Note that improved estimation is a benefit that is separate from the spike-sorting problem. To clarify, combining both sources of information yields a spike sorter that is superior to the traditional sorter. Because spike-sorting rules depend on unknown parameters, they must be estimated from data. The estimates of WF distributions and tuning curves happen to be superior when they are obtained jointly by the linked EM algorithm rather than separately.
Current spike-sorting methods focus on clustering neurons' characteristic spike waveforms. The resulting spike-sorted data are then typically used to estimate how covariates modulate the firing rates of neurons. However, when covariates do modulate the firing rates, they too provide information about spikes' identities, which thus far has been ignored for the purpose of spike sorting. We proposed a new automatic spike sorter that combines spike waveform and tuning information. In theory, it yields the lowest spike misclassification rates since it efficiently uses more information than current spike sorters do. It also facilitates the identification of joint spikes and noise-corrupted spikes, because it models explicitly the probabilities of several neurons spiking together as functions of their firing rates. But as with all other statistical methods, its performance in practice will depend on the quality of the assumed models for waveform features and tuning functions.
The proposed spike sorter essentially consists of performing spike-sorting and tuning function estimation simultaneously rather than sequentially, as is currently done. No more assumptions are needed, although the tuning model must now be specified before the data are spike-sorted. While parametric models such as normal or t-distributions are often assumed for waveform features, tuning functions should be modeled nonparametrically unless specific knowledge suggests otherwise. Indeed, waveform features are qualitatively invariant to the experiments that produce them, and models for their distributions have long been evaluated. In contrast, neurons' tuning properties are not only experiment specific, they are typically of primary interest. We showed that nonparametric firing rate models such as gaussian filters and splines could be used within our framework and that likelihood-based model selection techniques could be applied. For example, we tested that gaussian filters provided significantly better fits than cosine functions to the tuning curves of simulated motor cortex neurons and determined the smoothness of these filters by cross-validation. We also showed that standard statistical tests of hypotheses could determine which covariates modulate neurons spiking.
So what could go wrong? We illustrated that assuming an inadequate tuning model could severely bias tuning curve estimates, and hence bias spike classifications. But to be fair to the proposed approach, tuning curve estimates were just as biased when the same model was fitted to the true neurons' spike trains. This shows that model selection for tuning functions is equally important whatever the estimation procedure may be. An inadequate model for waveform features could also bias tuning curve estimates, since both waveform and tuning models are estimated simultaneously. But tuning curves fitted to data first spike-sorted based on waveforms would also be biased, since a bad waveform model would bias spike classifications.
Unexpectedly, it is the traditional procedure that produced a puzzling result. In the example in section 3.4, the neurons' firing rates estimated by the proposed spike sorter matched the true rates within error, whereas the rates estimated from data first spike-sorted based on waveforms did not. Note that this effect was not seen in Figures 5 and 6, because the rates estimated by the proposed spike sorter were compared to estimates obtained from the true neurons' spike trains. The result of section 3.4 is not a fluke but a general flaw of the traditional sequence of first spike sorting based on waveforms, then estimating tuning functions. Ventura (2009) shows that tuning functions estimated by the traditional approach are systematically biased and inconsistent, whereas the proposed spike sorter yields consistent estimates.
Our last comments concern the assumptions we made and possible algorithmic improvements. We estimated the proposed spike sorter via an EM maximum likelihood algorithm. We presented the general form of this algorithm and provided a detailed implementable version under the assumption that neurons are independent and spike according to Poisson processes. We assumed that spike waveform features were normally distributed, but distributions deemed more appropriate could readily be used instead, for example, t-distributions, as suggested by Harris et al. (2000) and Shoham et al. (2003), provided an EM algorithm existed or could be developed to estimate their parameters. Some spike-sorting methods also include, with varying degrees of formality, additional information such as refractory periods and nonstationarity of waveforms, for example spike amplitude decays after short interspike intervals (Fee et al., 1996; Pouzat, Delescluse, Voit, & Diebolt, 2004). We plan to incorporate these effects in our framework by letting the neurons' firing rates depend on the past. We kept to a minimum the issues of determining the number of neurons, choosing initial values, and avoiding convergence to local likelihood maxima. However, we provided evidence that including tuning information alleviated these issues. More sophisticated parameter space searches could also be implemented, for example, reversible Markov chain Monte Carlo methods (Richardson & Green, 1997), or the method of Figueiredo and Jain (2002).
A.1. Step-by-Step Run of the Linked Algorithm for Figure 5.
See Figure 12.
A.2. Percentage Overlap Between Two Curves.
The percentage overlap between two curves, for example, two tuning curves or two WF distributions (see Figure 13) is the intersection area under the curves (dark gray) divided by the total area under the two curves (light gray plus dark gray). A value of 1 indicates complete overlap and a value 0 perfect separation.
I was supported by NIH grants 2RO1MH064537 and 1R01EB005847 and thank two reviewers for their thoughtful comments.
Waveform measurements are seldom collected when the electrode voltage is below threshold (zt = 0). Missing values for at when zt = 0 do not affect our method.
Equation 2.7 misses the explicit condition zt = 1. However, this condition was implicit since then we considered only time bins such that zt = 1
The family of models defined by equation 3.3 includes cosine tuning as a special case, when the bandwidths correspond to one nonparametric degree of freedom (DF) for each of cos(d) and sin(d). The DFs of a nonparametric model are defined as the trace of the projection matrix for the regression, to mirror the parametric case, for which that trace equals the number of covariates included in the model. Hence, a nonparametric model that uses p DFs can be thought of as a polynomial of degree p in a covariate or as a model with p different covariates.
The probability of a spike in a 1 msec bin is Pr(Z>1) = 0.308 where Z is Normal(0,1).
In this particular application, the LR tests are equivalent to the well-known two sample t-tests.