## Abstract

Populations of cortical neurons exhibit shared fluctuations in spiking activity over time. When measured for a pair of neurons over multiple repetitions of an identical stimulus, this phenomenon emerges as correlated trial-to-trial response variability via spike count correlation (SCC). However, spike counts can be viewed as noisy versions of firing rates, which can vary from trial to trial. From this perspective, the SCC for a pair of neurons becomes a noisy version of the corresponding firing rate correlation (FRC). Furthermore, the magnitude of the SCC is generally smaller than that of the FRC and is likely to be less sensitive to experimental manipulation. We provide statistical methods for disambiguating time-averaged drive from within-trial noise, thereby separating FRC from SCC. We study these methods to document their reliability, and we apply them to neurons recorded in vivo from area V4 in an alert animal. We show how the various effects we describe are reflected in the data: within-trial effects are largely negligible, while attenuation due to trial-to-trial variation dominates and frequently produces comparisons in SCC that, because of noise, do not accurately reflect those based on the underlying FRC.

## 1 Introduction

Analysis of spike count correlation has provided important neurophysiological insights, as summarized in Cohen and Kohn (2011), Gu et al. (2011), Harris and Thiele (2011), Jeanne, Sharpee, and Gentner (2013), Mitchell, Sundberg, and Reynolds (2009), Smith and Kohn (2008), and Smith and Sommer (2013). However, as documented in these references, a striking feature of the interaction among neurons is that it can occur at multiple timescales. A neuron’s spike count collected across a time interval of length *T*, such as *T* = 1000 ms, reflects both the time-averaged drive to that neuron and additional stochastic fluctuations, which produce irregular spiking at timescales smaller than *T*, usually described as point process variation (Shadlen & Newsome, 1998; Churchland et al., 2011). That is, the spike count involves both a relatively slowly varying input drive, at the timescale *T*, and faster point-process fluctuations that determine when the neuron fires. Across an interval of length *T*, this point-process variation determines the number of times each neuron fires for a given level of time-averaged input drive. Correspondingly, for a given pair of neurons, the spike count correlation (SCC) may reflect (A) the firing rate correlation (FRC), that is, the correlation in the time-averaged drive to these neurons, and (B) point process noise (Kass & Ventura, 2006; Staude, Rotter, & Grún, 2008; Goris, Movshon, & Simoncelli, 2014). It also could involve (C) more precisely timed within-trial correlation effects that might arise, for instance, from (lagged) synchronous spiking (Kelly & Kass, 2012; Harrison, Amarasingham, & Kass, 2013), which has been observed in some cortical areas (macaque V4: Smith & Kohn, 2008; Smith & Sommer, 2013) and may be modulated by state transitions that are more prominent in some anesthetized preparations (Kelly, Smith, Kass, & Lee, 2010; Ecker et al., 2014). Longer timescale effects, but still shorter than trial length, such as correlated up and down states (Steriade & Buzsaki, 1990; Cowan & Wilson, 1994; Timofeev, Grenier, & Steriade, 2001; Steriade, Timofeev, & Grenier, 2001), can also produce within-trial spike count covariability.

In this letter, we describe ways in which the SCC may not accurately measure the FRC in component A due to distortion caused by component B, and we present statistical methods for disambiguating component A from component B. Other studies have developed related techniques: Churchland et al. (2011), Goris et al. (2014), and Staude et al. (2008). Here we go further, using both parametric and nonparametric approaches, in order to provide a comprehensive set of tools that, as we show, perform well under conditions similar to those found with neural data. When component C is negligible, statistical inference and interpretation are simplified. Therefore, we focus especially on methods for evaluating the relevance of that component to the analysis of spike counts, including both statistical tests and estimates of magnitude. We then use the tools we have developed to analyze data recorded from area V4 while a monkey performed a fixation task, and we demonstrate that FRC behaves differently than SCC does in this setting.

*T*and that for trials , each pair of observed spike counts for two neurons comes from the doubly stochastic correlated point process depicted in Figure 1. This process consists of two levels. The first level consists of a pair of latent (unobserved) variables generated with some across-trial correlation (trial-to-trial correlation), where, on trial

*r*,

*X*is the average firing rate

_{ir}*FR*for neuron

_{ir}*i*multiplied by

*T*, The variable

*X*may be interpreted as an average input drive to neuron

_{ir}*i*. The second level consists of observed spike counts , generated jointly with expectation . In addition, there may be some within-trial point-process correlation. Thus, the conditional expectation of

*Y*, conditionally on , is We further assume that the conditional variance is where controls the dispersion of

_{ir}*Y*conditionally on

_{ir}*X*(Kass & Ventura, 2006; Churchland et al., 2011; Goris et al., 2014). The case holds when

_{ir}*Y*follows a Poisson distribution, conditionally on

_{ir}*X*.

_{ir}It is not possible to estimate the components in equation 1.4 separately, based on spike counts alone, without additional modeling assumptions (Amarasingham, Geman, & Harrison, 2015). In conjunction with equation 1.3, Churchland et al. (2011) imposed constraints on the variance of the spike counts, constraints estimated from the data, to extract method of moments estimates of correlations between single-neuron firing rates at different epochs of a trial. By assuming univariate Poisson-gamma models for single-neuron spike counts and then also assuming a sampling distribution for the values of SCC across stimuli, Goris et al. (2014) obtained estimates for FRC and for a quantity related to . Because they used univariate models for the spike counts, they were only able to obtain a single estimate of FRC for all stimuli. However, as illustrated in Figure 3A, the FRC of a neuron pair can (and often does) vary across stimuli. The approach we take allows for this. We build on previous work of Ventura, Cai, and Kass (2005) and Kass and Ventura (2006), who proved that spike count correlation necessarily increases with *T* in the presence of multiplicative trial-to-trial variation in time-averaged input drive, and Behseta, Berdyyeva, Olson, and Kass (2009), who used a bivariate hierarchical model as in Figure 1 to provide a correction for attenuation of correlation in the presence of point process noise. In Section 3 we derive nonparametric estimators of the components of equation 1.4 from spike trains (as opposed to from spike counts alone) by showing that certain estimators of intermediate quantities are unbiased. We also derive a nonparametric estimator of in equation 1.2 from spike trains, in contrast to Goris et al. (2014), who assume , and Churchland et al. (2011), who use like a tuning parameter and choose it such that variances and correlations lie in the required domains. We provide details on bivariate hierarchical models for spike counts that take the trial-to-trial distribution of to be log normal and the distribution of *Y _{ir}* to be Poisson with mean

*X*, and we suggest bootstrap methods to assess the fit of these models. Because analysis and interpretation are more straightforward when in equation 1.4, we propose a jitter test of this hypothesis. We show that the test is sufficiently powerful to find substantial departures from under realistic scenarios, and because one typically must examine many pairs of neurons, we use methods that control the false discovery rate (FDR) and the false nondiscovery rate (FNR). The latter assesses the likelihood of error when a decision is made to assume based on the jitter test. We also introduce generalized forms of tuning curve and tuning curve correlation (GTC and GTCC; see appendix H) that account for the trial-to-trial variability of the firing rates. In Section 4, we analyze the V4 data and find that the models fit well and that they support the effects discussed in Section 2. In Section 5 we discuss implications of our findings.

_{ir}Figure 2 summarizes the main features of our approach, in which we begin with nonparametric estimators for exploratory purposes (e.g., making informative plots) and to formulate the jitter test. We then move on to model-based inference and, finally, assess goodness-of-fit and sensitivity.

## 2 Spike Count Correlation as an Attenuated Firing Rate Correlation

The attenuation effect is defined as the case when SCC and FRC have the same sign and SCC has smaller magnitude (). Figure 3A shows an example neuron pair (macaque V4 area; Section 4) with different attenuation strengths across stimuli, and Figure 3B shows that SCC is an attenuated version of FRC for the majority of neuron pairs in our data set. Because , attenuation occurs whenever is small, specifically whenever is between and . The special case is of particular interest because equation 2.2 simplifies to , so that the attenuation effect is completely specified by ATT (and the results in Kass & Ventura, 2006, which assume , hold). In Section 3.3, we develop a jitter method to test the hypothesis , and in Section 4 we show that this hypothesis is reasonable for the majority of neuron pairs in the in vivo macaque V4 data set we analyze.

## 3 Estimating Sources of Correlations

Our aim here is to estimate the population quantities in equations 2.1 and 2.2 that characterize the hierarchical structure defined in Section 2 and Figure 1. In Section 3.1 we propose bivariate parametric models for the spike counts of neuron pairs, which imply parametric estimates for the components of equations 2.1 and 2.2. These models assume , allowing for the overdispersed spike counts that are typical of the visual cortex V4 data we analyze in section 4 (Tomko & Crapper, 1974; Tolhurst, Movshon, & Thompson, 1981; Cohen & Maunsell, 2009; Mitchell et al., 2009; Goris et al., 2014). We also provide methods to evaluate the fit of the models to data and assess their sensitivity to the assumptions. In Section 3.2, we propose nonparametric estimators of the population quantities in equations 2.1 and 2.2 based on binned spike counts, which only make assumptions about the timescale of within-trial dependencies between neurons. Because the interpretation of equation 2.2 is more straightforward when , we develop a model-free jitter test of this hypothesis in section 3.3.

### 3.1 Parametric Estimation

#### 3.1.1 Assessing the Fit of the Parametric Model

We assess the global goodness-of-fit (GOF) of a parametric model to data using a discrepancy measure between a nonparametric quantity and its parametric analog under the fitted model. A large discrepancy suggests that the model is not a good representation of the true spike count generation process. We consider two discrepancies: the Kullback-Leibler (KL) divergence of the fitted model (univariate or bivariate) from the empirical distribution of the data and, in the bivariate case, the divergence between the sample SCC and the SCC predicted by the fitted model.

*Kullback-Leibler Divergence.*Let be the spike count of a neuron in trial

*r*in the univariate case and in the bivariate case. Let be their empirical probability mass function (p.m.f.), where is the

*L*

_{1}-norm, and be a parametric model fitted to . The Kullback-Leibler (KL) divergence between

*P*and is where is the maximum of the log likelihood of . Hence, the KL divergence can be written as where stands for “empirical maximum log likelihood” and for “parametric maximum log likelihood.” We use to test the hypothesis , and approximate its null distribution,

_{n}*F*, via a parametric bootstrap (Efron & Tibshirani, 1994). We reject the hypothesis if for some significance level .

_{KL}*SCC Divergence.*Let be a bivariate parametric model for . Define where is the sample SCC of (see equation 3.12), is the estimate derived from model fitted to the data (see equation 3.3), and is the Fisher transformation. We reject the hypothesis if or , where

*F*is the null distribution of

_{D}*D*that we approximate using a parametric bootstrap.

#### 3.1.2 Sensitivity of FRC Estimates to Model Assumptions

Goodness-of-fit tests do not always have high power to detect deviations between the data and the assumed parametric model, so it is useful to assess how sensitive the parametric estimate of FRC is to mild deviations from the DC-PLN model. We do that by simulating spike counts from the more general model specified in equation 3.5, estimating FRC assuming that the data are DC-PLN distributed, and quantifying the deviations of the estimate from its true value.

- Under model 3.2, spike counts
*Y*conditional on their rates*X*are Poisson, which implies (see Section 2). Here we consider a generative model permitting : where is the closest integer to*x*. This model has and (see appendix D). Figure 4A illustrates the deviations of with and from the Poisson distribution— with . Under model 3.2, are bivariate lognormal. Here we generate them as follows: set and (1) generate , (2) evaluate , , and (3) repeat steps 1 and 2 until . When , it is easy to show that are bivariate lognormal. Figure 4B shows for (lognormal) and , with chosen to match typical values in the V4 data analyzed in Section 4.

Details of simulations and sensitivity results for the V4 data are in Section 4.2.

### 3.2 Nonparametric Estimation

*n*independent spike trains of duration for neuron

*i*. We divide into

*m*bins of equal length, and we let

*Y*denote the spike count in trial

_{irj}*r*and bin

*j*, so that is the total spike count for neuron

*i*on trial

*r*. We assume , where

*p*is the probability that given that neuron

_{ij}*i*spikes, the spike occurs in bin

*j*, independent of trial

*r*, ; in other words, the expected spike count in bin

*j*in trial

*r*is a portion

*p*of the total spike count

_{ij}*Y*in trial

_{ir}*r*, with

*p*constant across trials. We let and define

_{ij}^{1}

If the spikes of neurons 1 and 2 given their rates are uncorrelated whenever they are separated by more than K bins, that is, they satisfy for whenever bins *j* and *h* are such that , then and are unbiased estimators of and , respectively.

The proof is in appendix E. The probabilities *p _{ij}* are typically unknown but can be estimated by the corresponding observed proportions , or smooth versions. In section 4 we estimate

*p*based on 50 ms bin widths peristimulus time histograms. The unbiased property depends on

_{ij}*K*: given the trial length

*T*and number of bins

*m*, we need

*K*such that , where is such that the spikes of the two neurons are correlated only if they are within . If the spikes are still correlated when they are farther than

*T*apart, we cannot obtain unbiased estimates of and . Thus, proposition

^{1}yields unbiased estimates only if within-trial covariabilities occur on timescales smaller than the trial duration

*T*, and

*m*is sufficiently large, specifically, (see appendix E). Within these constraints, we use the smallest value of

*K*, that is, , because the variance of increases with

*K*. For the data in Section 4, Smith and Sommer (2013) found that based on jitter-corrected cross-correlograms, short time-scale correlation happened on a 5 to 10 ms half-width, so we use ms to be conservative. The variance of the estimators decreases with the number of bins

*m*because:

The trial length

*T*is finite so the cross-terms of the extreme indices in the summand of equation 3.7 are not available, so that the ends of the spike trains do not contribute to estimating . Using small bins (*m*large) therefore means using more data, which reduces the variance of the estimators.and

*m*large makes it possible for to be an integer. Otherwise we must round*K*up to the next integer so we avoid biasing the estimators, with the consequence that the cross-terms for large in equation 3.7 may contain less signal and more noise, and thereby increase the variance of the estimators.Small bins better localize the beginnings and ends of within-trial effects, which reduces the variance of equation 3.7 for the same reason as in item 2.

In Section 4 we used bins, and thus . It would be statistically more efficient to use the largest *m* possible up to recording accuracy (in our case, this would be , since ms and spike time accuracy is in ms), but the computational burden to perform the within-trial covariability jitter tests (see Section 3.3) would be prohibitive (see appendix G).

#### 3.2.1 Plug-In Estimators

^{1}to define plug-in estimators for the components of equations 2.1 and 2.2: where and are the sample estimates of the corresponding population quantities. When , we set (see section 3.3). Note that the plug-in estimators do not always lie in the required domains (Churchland et al., 2011, encountered similar problems); for example, could lie outside . They are nevertheless useful estimates for exploratory data analysis.

We are able to estimate the components of equations 2.1 and 2.2 without specific assumptions on , in contrast to the assumption that in Section 3.1 and in Goris et al. (2014), and to the approach of Churchland et al. (2011), who use like a tuning parameter and choose it such that variances and correlations lie in the required domains. Here we derive a nonparametric estimate of , by using the information carried by spike counts at timescales smaller (i.e., spike counts in smaller bins) than the timescale of SCC and FRC.

### 3.3 Jitter Test for Within-Trial Correlation

In Section 2 we presented a hierarchical model where spike counts are trial-to-trial correlated through the rates , but also, conditionally on , they might share within-trial covariability, measured by or . The case (equivalently ) is of particular interest because then the attenuation of FRC is fully explained by the attenuation factor ATT, equation 2.2.

A parametric likelihood ratio test (LRT) of based on model 3.2 is problematic because it is limited to the one-sided alternative , whereas could be negative, and the asymptotic chi-square null distribution of the LRT does not hold because is at the boundary of the parameter space. A bootstrap would be needed to obtain the null distribution. The parametric estimate of also turned out to be less efficient than the nonparametric estimate, equation 3.7, in a simulation study, where we repeatedly generated synthetic data sets similar to ours and estimated parametrically and nonparametrically. Both estimators were unbiased, but the nonparametric estimator had smaller variance, which translates into more power in a testing context. The difference in efficiency happens because the two estimators are obtained from different reductions of the observed spike trains: the parametric estimate of is based on spike counts in bins of ms and the nonparametric estimate on spike counts in ms bins. The spike trains contain more information about than spike counts (data processing inequality; Cover & Thomas, 2012), and presumably spike counts in smaller bins contain more information than spike counts in larger bins, since they are closer to the original data.

*F*is the null distribution of obtained using a jitter method (Harrison et al., 2013), as follows. Let denote the spike counts of neuron

*i*in trial

*r*and time bin , such that is the total spike count of neuron

*i*in trial

*r*. Let be jittered spike counts for neuron

*i*for each of the

*n*trials, where

*p*is the probability that, given that neuron

_{ij}*i*spikes, the spike occurs in bin

*j*(the same

*p*’s appear in section 3.1). Let be the nonparametric estimator of calculated from these jittered spike counts. We repeat this

_{ij}*B*times to obtain

*B*values of : their empirical distribution approximates the null distribution

*F*of . To obtain accurate

*p*-values requires a simulation size

*B*in the thousands. However, a normal probability plot revealed that the were approximately normally distributed (this is not surprising given that equation 3.7 is an average), so a faster approximation of

*F*is the normal distribution with mean 0 and variance equal to the sample variance of

*B*values of , where

*B*can be much smaller, say (see Canty, Davison, Hinkley, & Ventura, 2006). The jitter test is a conditional parametric bootstrap test that is exact when the

*p*’s are known. In our application, we must estimate the

_{ij}*p*’s (see section 3.2), so the test is no longer exact. Figure 5 shows that its spurious detection rate is nevertheless close to the significance level , so the test is reliable.

_{ij}#### 3.3.1 Statistical Power

#### 3.3.2 False Discovery Rate Control

When many tests are performed, it is useful to cap the expected proportion of false rejections among all rejections—the false discovery rate (FDR)—below some desired . Benjamini and Hochberg (1995) proceed as follows: given *M* independent tests with null hypotheses and corresponding *p*-values , they reject where and is the *j*th largest *p*-value. Then, in expectation, fewer than a proportion of the *K* rejected tests correspond to true null hypotheses. Benjamini and Yekutieli (2001) show that this procedure also controls the FDR in the case of positive regression-dependent tests, but advocate a more conservative FDR control for tests with general dependencies, which consists of replacing with .

The jitter tests are dependent since they are applied to all neuron pairs, but they are not necessarily positive regression dependent, so we investigated the performances of both procedures for data like ours. We simulated independent trials of spike trains from an *N*-variate Poisson-lognormal model with (we extended model 3.1 from to ), where and were assigned values that match the V4 data in section 4, such that half of the neuron pairs had FRC below 0 and half had FRC above 0. Neurons were randomly assigned to three groups of size . For each trial *r*, the spike trains of all neurons in a group were contaminated by the same noise spike train with rate , and the noise spike trains were independent between groups. We used , and 2 for the three groups, respectively, to match approximately the values calculated in the data. The proportion of tests with *H*_{0} false was therefore one-third. We performed the jitter tests for all neuron pairs and recorded the observed FDR and false nondiscovery rate (FNR), that is, the expected proportion of false nulls among the nonrejections, for a range of nominal FDR . We repeated the simulation times to obtain accurate mean estimates of the observed FDR and FNR. In Figure 5B, we show that the Benjamini and Hochberg (1995) procedure provides reliable FDR control for data like ours (the observed FDR is below ), perhaps because the tests were only weakly dependent, as suggested by the Brownian covariance test of Székely and Rizzo (2009). We also observe FNR 10% whenever , which means that most cases with are detected by the jitter test with FDR control. The Benjamini and Yekutieli (2001) procedure for tests with general dependencies provided too conservative a control (smaller FDR and larger FNR; results not shown), so we did not use it.

## 4 Analysis of V4 Data

We analyze the rhesus macaque monkey (*Macaca mulatta*) visual V4 data collected by Smith and Sommer (2013) during the following task (see Figure 6). The animal fixated a small blue dot in the center of a computer screen while a sinusoidal grating covered the aggregate receptive field area of the recorded neurons. The grating’s size and spatial and temporal frequencies were fixed, and the direction of drift was one of 12 equally spaced directions for each trial; independent trials of duration were recorded for each of the 12 stimuli (directions). The neural data were acquired from a Utah array (100 electrodes, Blackrock Microsystems) implanted in V4. The electrode voltages were amplified and bandpass filtered (250–7.5 kHz) prior to spike sorting with custom software (Kelly et al., 2007). All procedures were approved by the Institutional Animal Care and Use Committee of the University of Pittsburgh and were in compliance with the guidelines set forth in the National Institutes of Health’s Guide for the Care and Use of Laboratory Animals.

For our statistical analysis, we retained only the 53 well-isolated single units with minimum signal-to-noise ratio 2.75 (Kelly et al., 2007) and 1 Hz minimum trial-averaged firing rate, and studied the relationships between all neuron pairs-stimuli combinations. We proceeded as outlined in Figure 2. We first applied the jitter test (see section 3.3) with a false discovery rate (FDR) of 10% to all neuron pairs-stimuli combinations to determine which pairs have . Only a minority of the V4 neuron pairs exhibited significant within-trial covariance (6.04% using a two-sided test, and 10.07% using a one-sided test with alternative ), which suggests that within-trial correlations have a weak impact on SCC, and therefore that the attenuation effect is mainly characterized by ATT in equation 2.2. Figure 7B shows the distribution of the nonparametric estimates (see equation 3.8); it is concentrated around 0.35, which means that the magnitude of observed SCCs is on average only 35% of the magnitude of FRC.

Next we fitted the parametric models in Section 3.1 consistently with the jitter test results (that is, we fitted the DC-PLN model, equation 3.2, if , and the PLN model, equation 3.1 otherwise) using the maximum likelihood algorithm 2 in appendix C, and obtained estimates of the components of equation 2.2. A component of the model fitted to a neuron pair is shown in Figure 8; goodness-of-fit diagnostics and sensitivity to parametric assumptions are summarized in sections 4.1 and 4.2. Figure 3B shows the sample estimates of SCC in equation 3.12 versus the model-based estimates of FRC in equation 3.3 for all 16,536 neuron pairs-stimuli combinations. The SCC is substantially attenuated compared to the FRC, which is consistent with the nonparametric estimates of ATT being well below one in Figure 7B. Figures 7C and 7D further show that on average, SCC and FRC decrease with pairwise interneural distance (distance between electrodes) and increase with tuning curve correlation (TCC; see equation H.1), which is a measure of similarity between the neural activity of two neurons. Smith and Sommer (2013) had noted these relationships for SCC but not FRC. The sample estimate of ECC, equation F.1, also decreases with interneural distance and increases with TCC, although it is hard to see by the naked eye because ECC is close to zero.

In appendix H, we also display SCC, FRC, and ECC against the generalized tuning curve correlation (GTCC), a combined measure of tuning curve similarity and firing rate covariability between a neuron pair, which is motivated in Figure 8. This figure shows the estimated joint generalized tuning curve (GTC) of a neuron pair, that is, the estimated joint distribution of their firing rates for all stimuli, and shows not only how the neurons’ average firing rates vary jointly with the stimulus—the TCC provides a one number summary of that covariability across stimuli—but also how differently these rates are correlated across trials for different stimuli. For the neuron pair shown in Figure 8, these rates are virtually uncorrelated when stimulus 7 is presented and very strongly positively correlated for stimuli 2, 9, and 12. The GTCC is a one-number summary of both tuning curve similarity and firing rate covariability between a neuron pair.

### 4.1 Parametric Model Goodness of Fit

Figures 3B, 7C, and 7D display the parametric estimates of FRC. We justify here that the parametric models fit the data.

Figure 9A shows the sample variance versus the sample mean of the spike counts of the 53 neurons for all 12 stimuli: the spike counts are overdispersed, which is a feature of the Poisson-lognormal model. The KL divergence test (see Section 3.1) confirms that the univariate Poisson-lognormal model (i.e., the univariate version of equation 3.1) fits single neuron spike counts well; only 5% of the tests were rejected, which matches the spurious detection rate . To provide a contrasting result, we also performed KL divergence tests assuming Poisson spike counts. Their *p*-values are plotted in Figure 9B against the spike counts’ Fano factors: 52% of the Poisson tests were rejected, including all the tests applied to data with Fano factors above 0.6. The *p*-values of the univariate Poisson-lognormal KL divergence tests are also plotted and exhibit no relationship with the Fano factor.

Now we assess the fits of the PLN and DC-PLN models in equations 3.1 and 3.2 to pairs of spike counts. We randomly selected 2500 cases ( 15% of the 16,536 neuron pairs-stimuli combinations) to which we applied the KL and SCC divergence tests (see Section 3.1). Only about 4% of the neuron pairs did not pass the SCC divergence test, and less than 1.5% did not pass the KL test, at the 5% significance level, which suggests that the bivariate parametric models fit pairs of spike counts well.

### 4.2 Sensitivity of FRC Estimates to Model Assumptions

Figures 3B, 7C, and 7D display the parametric estimates of FRC where in about 90% of cases, we imposed according to the results of the jitter test. This test has a low false nondiscovery rate (FNR, Figure 5B) and high power to detect deviations from when or (see Figure 5A) so we are confident that we fitted the appropriate model, the PLN model, in such cases. However, we would like to know how the parametric estimate of FRC is affected by assuming when the jitter test fails to detect that is small and positive, and how sensitive it is to mild deviations from the Poisson and lognormal assumptions.

We applied the methods in section 3.1.2. We generated trials of duration ms of spike count pairs from the generalized bivariate spike count model, equation 3.5, for in , where and were calibrated for each so that spike counts had means 7, variances , and , to match typical values in the V4 data. We used these data to estimate under model 3.1, and repeated the simulation 10,000 times. Figure 10A shows the mean of the repeat values of and Figure 10B the standard deviation of relative to what it should be if the spike counts were generated from the estimation model, equation 3.1, as functions of the generating model parameters, , and (the estimating model has , and ). The deviations from lognormality () have practically no effect on , and the deviations from moderate effects, that is, is fairly robust to mild deviations from the Poisson-lognormal model. However, the deviations from increase the bias of substantially and reduce its variance,^{2} so inferences are not reliable (for example, a confidence interval for the true FRC would be centered on the wrong value and have lower coverage than expected). The possibility of this happening in our data is small because the FNR is less than 10% (see Figure 5B).

## 5 Discussion

We have provided a collection of methods for eliminating Poisson-like noise from the SCC to produce the FRC. As shown in Figure 3B, FRC tends to be much larger in magnitude than SCC and should be much more sensitive to experimental manipulations. A relatively simple bivariate hierarchical model may be introduced when the within-trial correlation is sufficiently small that its impact on FRC is negligible, and we proposed a jitter test of this assumption. We showed that the majority of pairs of V4 neurons failed to reject the null hypothesis of no within-trial correlation. Careful analysis requires an evaluation of the power of this test and the sensitivity of the proposed parametric model to mild departures from assumptions, which we supplied, along with an evaluation of the expected false nondiscovery rate (FNR) of the test, which we determined to be low. Together, these analyses strongly suggest that in our data, within-trial correlation rarely induces distortions in the estimate of FRC when the jitter test does not reject the null hypothesis. For pairs of neurons where the hypothesis of no within-trial correlation was rejected, we used a nonparametric estimate of within-trial correlation and then applied a correlated Poisson hierarchical model that allows for positive within-trial correlation. A more complete accounting of within-trial variation, which would include negative within-trial correlation, could be obtained from point process models (Snyder & Miller, 1991; Kass et al., 2014) and is the subject of future research.

Our finding that only a few V4 neuron pairs exhibited within-trial correlations differs from the result of Goris et al. (2014) in the macaque visual area V1. First, their SCCs are larger than in our V4 data; a similar discrepancy is also documented by the V1 analysis of Smith and Kohn (2008) and the V4 studies of Smith and Sommer (2013), Cohen and Maunsell (2009), and Mitchell et al. (2009). Second, the attenuation effect in Goris et al. (2014) is much weaker than ours, so that SCC is only weakly attenuated compared to FRC (in that paper, FRC is called “gain correlation”), while their point process covariability is much higher (ours is measured by or ECC). These differences may be due to the different area under study. The different anesthetics used in the two experiments (our animal was alert) might also have induced different levels of up and down states (Steriade & Buzsaki, 1990; Cowan & Wilson, 1994; Timofeev et al., 2001; Steriade et al., 2001), which in turn could have affected within-trial covariabilities.

Our overall goal was to provide a useful suite of methods for examining the trial-to-trial correlated behavior of neurons at trial-length timescales. To the extent that neural implementations of maximum likelihood and Bayesian inference are possible (e.g., Ma, Beck, Latham, & Pouget, 2006), downstream neural networks should be able to strip away noise from spike counts and use FRC to advantage, just as we have been able to do statistically. Whether they do so remains an open question.

## Appendix A: Spike Count Correlation Formula

## Appendix B: Lognormal Distribution

## Appendix C: Doubly Correlated Poisson-Lognormal Distribution

We can compute the ML estimators of numerically using algorithm 1, when only the spike counts are available, or algorithm 2, when the spike trains are available. In data simulated as in section 4.2, we found that algorithm 2 typically yielded lower-variance estimates of FRC. We therefore recommend algorithm 2, keeping in mind that algorithm 1 could provide better estimates of other quantities. Note that if we assume , that is, the PLN model, then we just use step 2 of algorithm 2 with .

## Appendix D: Generalized Count Model

Let have the distribution in equation 3.6. Thus,

- •
If , then , and , where , such that .

- •
If , then and .

- •
If , then and .

## Appendix E: Nonparametric Estimators of and

The proof that is an unbiased estimator of is similar; all we need is to replace and by and .

Note that combining the conditions and of section 3.2 yields , which implies and . These conditions are necessary to obtain unbiased estimators.

## Appendix F: Expected Conditional Spike-Count Correlation

## Appendix G: Computational Time

We implemented our estimation and testing procedures in , by using personal codes and modifications of the package to estimate the DC-PLN model, equation 3.2. We used a laptop with an Intel Core 2 Duo CPU T6600 ( GHz), RAM 4GB (DDR 800 MHz); and OS Ubuntu 15.04. Table 1 displays the time to compute (see equation 3.7) and perform the jitter test on a neuron pair, and Table 2 to estimate the parametric model. The times required to process a data set with *N* neurons and *S* stimuli can be obtained by multiplying the values in the tables by . For example, in section 4, the multiple jitter tests required about minutes.

. | . | . | Number of Trials (n)
. | ||
---|---|---|---|---|---|

. | . | m
. | 60 . | 80 . | 100 . |

0.01 | 50 | 0.3 (207) | 0.3 (275) | 0.4 (346) | |

100 | 0.8 (322) | 0.9 (426) | 1.1 (526) | ||

200 | 2.1 (603) | 2.5 (867) | 3.4 (976) | ||

0.02 | 50 | 0.6 (235) | 0.6 (289) | 0.8 (357) | |

100 | 1.2 (354) | 1.3 (492) | 1.8 (581) | ||

200 | 3.3 (744) | 4.3 (914) | 5.5 (1202) |

. | . | . | Number of Trials (n)
. | ||
---|---|---|---|---|---|

. | . | m
. | 60 . | 80 . | 100 . |

0.01 | 50 | 0.3 (207) | 0.3 (275) | 0.4 (346) | |

100 | 0.8 (322) | 0.9 (426) | 1.1 (526) | ||

200 | 2.1 (603) | 2.5 (867) | 3.4 (976) | ||

0.02 | 50 | 0.6 (235) | 0.6 (289) | 0.8 (357) | |

100 | 1.2 (354) | 1.3 (492) | 1.8 (581) | ||

200 | 3.3 (744) | 4.3 (914) | 5.5 (1202) |

Note: The implementation time of the jitter test based on (section 3.3) with 100 bootstraps and 20-bin psth is indicated in parentheses.

. | Number of Trials . | ||
---|---|---|---|

. | 60 . | 80 . | 100 . |

Estimate assuming | 240 | 293 | 383 |

Estimate (algorithm 1) | 4516 | 6364 | 7494 |

Estimate (algorithm 2, step 2) | 2257 | 2769 | 3721 |

. | Number of Trials . | ||
---|---|---|---|

. | 60 . | 80 . | 100 . |

Estimate assuming | 240 | 293 | 383 |

Estimate (algorithm 1) | 4516 | 6364 | 7494 |

Estimate (algorithm 2, step 2) | 2257 | 2769 | 3721 |

Note: Time for step 1 of algorithm 2 can be found in table 1.

### G.1 Choosing *m* of

In section 3.1 we argued that the number of bins *m* should be large. However, the computational burden may be too large when we perform the jitter test (see table 1): the number of elements in the summand in equation 3.7 is , where , so that for a fixed ratio , the computational time of equation 3.7 is proportional to . To set an upper bound *D* on *M* given , we use and .

## Appendix H: Generalized Tuning Curve Correlation

The tuning curve of a neuron *i* is generally defined as the spike count averaged across trials, , which varies with the stimulus *s* when *s* modulates the neuron (Butts & Goldman, 2006). Under the assumptions of Section 2, the tuning curve reduces to , which contains information about the firing rate averaged across trials but none about the trial-to-trial variability of . Instead of reducing the tuning properties to the mean firing rate, we may retain in addition the full distribution of , say *F _{is}*, for each stimulus

*s*: this ensemble is what we call the generalized tuning curve (GTC). Figure 11A displays the GTC for two example V4 neurons, assuming the Poisson-lognormal model, equation 3.1, for their spike counts. Figure 8 shows the joint GTC for a neuron pair, which further displays their shared trial-to-trial covariability.

*S*, which codes for the stimulus; for example, when all stimuli are presented with equal frequency,

*S*is uniform over the stimuli. Consider the toy example where, for a stimulus

*s*, are (trial-to-trial) random variables with generative model: where we choose to achieve . This model implies for all

*s*and, in turn, , from which one might conclude that the two neurons have proportional tuning curves. But the model also implies for all

*s*, which reveals that TCC only summarizes the similarity of the expected firing rates of the two neurons across stimuli but does not convey the sign and magnitude of their shared trial-to-trial variability. This motivates us to define the generalized tuning curve correlation: where the correlation is with respect to the stimulus random variables

*S*, as in equation H.1, but also to the trial-to-trial random firing rates . Hence, the TCC only captures the covariability of the expected firing rates across stimuli, and the GTCC also accounts for changes in their joint distribution, such as trial-to-trial firing rate correlation (FRC). That is, the GTCC summarizes the strength of the relationship between two neurons across both trials and stimuli.

*r*, then since for all

*r*. When both the stimulus and the firing rates are random, the GTCC can be viewed as an approximate average of FRC across stimuli or as an approximate average of TCC across trials. More precisely, we can rewrite where depends on the joint distributions of the firing rates at different stimuli (Figure 8 shows such a distribution), so that GTCC can be viewed as a modified version of TCC. Figure 11B shows GTCC versus TCC for all the neuron pairs in the V4 data: GTCC has smaller magnitude, and GTCC and TCC sometimes have different signs. Figure 11C shows SCC, FRC, and ECC as functions of GTCC, which resembles the plot of the same quantities versus TCC in Figure 7B, although the relationships appear stronger. This happens because GTCC is approximately an average of FRC across stimuli, as justified above, whereas SCC is an attenuated version of FRC.

## Acknowledgments

This work was supported by grants from the NIH: RO1 MH064537. Matthew A. Smith was supported by NIH grants R00EY018894, R01EY022928, and P30EY008098, a career development grant and an unrestricted award from Research to Prevent Blindness, and the Eye and Ear Foundation of Pittsburgh. We used the R-packages, by Vidar Grøtan and Steinar Engen; , by Maria L. Rizzo and Gabor J. Székely; and , by Catherine Loader.

## References

## Notes

^{1}

A sufficient condition for the denominator of equation 3.7 to be strictly positive is for all , since implies , where . The *p _{ij}* could be smoothed to achieve for low firing rate neurons.