## Abstract

Pyramidal neurons recorded from the rat hippocampus and entorhinal cortex, such as place and grid cells, have diverse receptive fields, which are either unimodal or multimodal. Spiking activity from these cells encodes information about the spatial position of a freely foraging rat. At fine timescales, a neuron’s spike activity also depends significantly on its own spike history. However, due to limitations of current parametric modeling approaches, it remains a challenge to estimate complex, multimodal neuronal receptive fields while incorporating spike history dependence. Furthermore, efforts to decode the rat’s trajectory in one- or two-dimensional space from hippocampal ensemble spiking activity have mainly focused on spike history–independent neuronal encoding models. In this letter, we address these two important issues by extending a recently introduced nonparametric neural encoding framework that allows modeling both complex spatial receptive fields and spike history dependencies. Using this extended nonparametric approach, we develop novel algorithms for decoding a rat’s trajectory based on recordings of hippocampal place cells and entorhinal grid cells. Results show that both encoding and decoding models derived from our new method performed significantly better than state-of-the-art encoding and decoding models on 6 minutes of test data. In addition, our model’s performance remains invariant to the apparent modality of the neuron’s receptive field.

## 1 Introduction

A fundamental goal in neuroscience is to establish a functional relationship between a sensory stimulus or induced behavior and the associated neuronal response. For example, in a rodent experiment (O’Keefe, 1979; Hasselmo, 2008), microelectrodes are implanted into a rat’s hippocampus, and the entorhinal cortex and ensembles of neurons’ activities are recorded while the animal freely forages in an open arena (see Figure 1A). The goals of such an experiment are to understand how a single neuron or population of neurons encodes spatial information of the rat’s position and to determine if we can decode the animal’s position based on their spiking activity (Brown, Frank, Tang, Quirk, & Wilson, 1998; Kloosterman, Layton, Chen, & Wilson, 2014).

In the experiment described above, pyramidal neurons recorded from the rat entorhinal cortex and hippocampus have shown complex grid-like or place-like spiking patterns as a function of the rat’s position in two-dimensional (2D) environments (O’Keefe, 1979; Moser, Kropff, & Moser, 2008; Hasselmo, 2008) (see Figure 1). However, it is well known that these neurons fire in a non-Poisson fashion and their spiking also depends on their own spiking history. For instance, bursting patterns have been found to be a significant component of hippocampal neuronal spiking activity (Barbieri, Quirk, Frank, Wilson, & Brown, 2001; Fenton & Muller, 1998). Thus, encoding and decoding models that capture spike history dependencies will likely yield better performance (Truccolo, Eden, Fellows, Donoghue, & Brown, 2005; Pillow et al., 2008). However, this well-known intrinsic phenomenon has not been fully explored in the rodent hippocampal decoding literature (Brown et al., 1998; Zhang, Ginzburg, McNaughton, & Sejnowski, 1998; Barbieri et al., 2004; Santaniello, Montgomery, Gale, and Sarma, 2012; Kloosterman et al., 2014). The fact that the receptive field has a complex and multimodal shape (especially in 2D space) complicates such modeling. Parametric models may be insufficient to fully characterize a complex, multimodal spatial receptive field structure and history dependence simultaneously.

Nonparametric modeling assumes no explicit probability distribution (i.e., distribution free) and lets data speak for themselves; therefore, it is more flexible in statistical data analysis. In the literature, several nonparametric methods have been developed for modeling neuronal spike activity (Chen, Kloosterman, Layton, & Wilson, 2012; Kloosterman et al., 2014; Rad & Paninski, 2010; Coleman & Sarma, 2010). Specifically, Chen et al. (2012) and Kloosterman et al. (2014) proposed an encoding framework based on kernel density estimation methods for capturing complex dependencies between spiking features and the rat’s position in a 1D environment. However, their model did not incorporate the spike history dependence. In another work, Theis, Chagas, Arnstein, Schwarz, and Bethge (2013) used a semiparametric mixtures-of-gaussian model to estimate ganglion cell responses that incorporate spike history effects and showed improvement over standard parametric models. Nonparametric methods, such as histograms and radial basis functions, have also been used for dimensionality reduction in fitting neuronal response functions (Sharpee, Rust, & Bialek, 2004; Williamson, Sahani, & Pillow, 2015) .

Inspired by these works, we applied nonparametric algorithms to develop encoding models that use both spatial and spike history information for neurons recorded from a rat’s hippocampus and entorhinal cortex. Further, we proposed a novel nonparametric decoding algorithm and applied it to reconstruct the rat’s trajectory using spiking activity from these cells. Our nonparametric framework is based on a novel and efficient nonparametric maximum likelihood (ML) density estimator over a set of densities whose square root is band-limited (BL) (Agarwal, Chen, & Sarma, 2015). The procedure uses convex optimization with quadratic time complexity for encoding (for details, see appendix A) and linear time complexity (in terms of number of neurons per time step) for decoding.

Finally, we use 6 minutes of unseen data from these 53 cells to evaluate the performance of our nonparametric models and compared them with state-of-the-art parametric models with and without spike history covariates. Results show that the performance of our nonparametric encoding and decoding models with spike history is significantly better than the performance of state-of-the-art parametric models. The result remained robust across the population of neurons with both unimodal and multimodal receptive fields.

## 2 Methods

### 2.1 Notation

Let *t* denote a continuous-time index, and let denote the neuronal index among the population. We use the lowercase *p* to denote the probability density function (pdf) and the uppercase *P* to denote the probability. Further, the terms *unimodal* and *multimodal cells* are used throughout, but the cells are not definitively defined as such.

### 2.2 Data Collection

In the experimental protocol, a Long-Evans rat was freely foraging in an open field arena (1 m radius) for a period of 36.36 minutes. Custom microelectrode drives with variable numbers of tetrodes were implanted in the rat’s medial entorhinal cortex and dorsal hippocampal CA1 area. Spikes were acquired with a sampling rate of 31.25 kHz and filter settings of 300 Hz to 6 kHz. Two infrared diodes alternating at 60 Hz were attached to the microelectrode array drive of the animal for position tracking. Spike sorting was accomplished using a custom manual clustering program (Xclust, M. A. Wilson). All procedures were approved by the MIT Institutional Animal Care and Use Committee.

In this selected recording session, neurons were recorded from both entorhinal cortex () and dorsal hippocampus (). Out of these 74 neurons, 21 (6 from entorhinal cortex) neurons were discarded, as they fired either independent of position (i.e., putative interneurons) or had very low firing rates (0.175 Hz). Out of the remaining neurons, (1 from entorhinal cortex, 26 from hippocampus) were unimodal and multimodal (1 from entorhinal cortex, 25 from hippocampus). The *x* and *y* coordinates of the rat’s position when these cells spiked are shown as scatter plots in Figure 2.

### 2.3 Neuronal Encoding and Decoding Models

*c*as (Brown, Barbieri, Ventura, Kass, & Frank, 2002; Truccolo et al., 2005) where denotes the rat’s position at time

*t*in the two-dimensional (2D) environment, and the vector consists of history covariates for cell

*c*up to time

*t*as in Kass and Ventura (2001), Sarma et al. (2012), Sarma et al. (2010), Santaniello et al. (2012), Kahn, Sheiber, Thakor, and Sarma (2011), and Agarwal, Sarma, Thakor, Schieber, and Massaquoi (2015).

The decoding model is the inverse of the encoding model and is used to predict the external covariates, at any given time *t*, from the spiking activity of cells up to the time *t*. In this letter, we realize the decoding model by using the maximum a posteriori (MAP) estimator that maximizes the posterior probability , where denotes the time since the last spike for cell *c* at time *t* and denotes the likelihood.

*t*, we assume that the likelihood of observing the time since the last spike, from the

*c*th neuron () given the rat’s spatial position at time

*t*, has a factorial form where we assume that the neurons fire conditionally independently given the position such that for any . This allows us to write the MAP estimator as

#### 2.3.1 Parametric Models

*Encoding.* In the context of a generalized linear model (GLM) framework, we assume that the CIF has a log-linear form. Namely, the logarithm of the CIF is linear with respect to its parameters (Truccolo et al., 2005). Inspired by the past successes of using gaussian and Zernike polynomials in modeling 2D hippocampal place cells (Barbieri et al., 2004), these two representations are chosen to estimate the CIFs. In addition, spike history dependencies are empirically modeled with spike count history covariates that correspond to the number of spikes within the past interval ms. Specifically, the CIF for each neuron *c* is modeled in the following two ways:

- •
- •

where are the unknown GLM parameters for and , respectively; denotes the fourth-order Zernike polynomials as used in Barbieri et al. (2004), and denotes the spike history for cell *c* at time *t*. Note that since the log of the CIF is modeled as a quadratic function in , the CIF has a gaussian shape.

*Decoding.*In light of equations 2.2 and 2.3, is evaluated as (Barbieri et al., 2001) where the integration in equation 2.6 is computed as where denote the spatial position covariates in either or . In writing the above probabilities and equations, it is assumed that the rat’s spatial position remains unchanged at the fine timescales of neuronal spiking. Note that can be directly estimated from the data as the spike history is known at any time during the decoding process.

A uniform prior *P*(*x*,*y*) is used for MAP estimation in equation 2.3. A similar encoding and decoding formulation can be used for spike history–independent models, except that the hyperparameter *J* is set to zero. The MAP estimates derived from both spike history–dependent and independent models are then smoothed out by filtering bidirectionally by a two-pole Butterworth filter with cutoff frequency Hz (for containing 95% power of coordinates) to obtain the final smooth estimate .

*Model estimation and validation.* The parameters of history-independent models are estimated by maximum likelihood regression, whereas the parameters for history-dependent models are estimated using -regularized GLM ridge regression (with regularization parameter ). The log-linear assumption on the parameters results in a convex likelihood function that allows for efficient computation of the ML estimate (McCullagh, 1984; Truccolo et al., 2005). We use a conjugate gradient algorithm to solve the convex optimization problem.

For simplicity, despite the shape differences in receptive fields, we assume that all neuronal place receptive fields have the same level of history dependence and number of parameters. Therefore, we use the one-vector of hyperparameters for all cells. These hyperparameters are then chosen by maximizing the *R*^{2} statistic computed as a measure of decoding performance on the validation data set using the hill-climbing algorithm (Yuret & De La Maza, 1993), which guarantees convergence to local maxima. In particular, the first parameter is varied while fixing the second parameter to some initial value. A maximum decoding performance is obtained in the first dimension, and the first parameter is then fixed to the corresponding value. The second is then varied, and its respective value that provides maximum decoding performance is obtained. The process is repeated until a local maximum in all dimensions is found—namely, no improvement in decoding performance is observed by varying any of the two parameters in any direction.

Out of 30.36 minutes of training data, the first 85% is used for model building, and the remaining 15% is used to validate hyperparameters. Once the hyperparameters are validated, the model is retrained in the entire 30.36 min of training data and assessed on the new 6 minute test data.

#### 2.3.2 Nonparametric Models

*Encoding.* Let , , and be the outcomes of random variables , respectively, where if and only if cell *c* fires an action potential in and 0 otherwise.

*c*can be written in the following equivalent form as the ratio of two densities (Kloosterman et al., 2014): where

*N*denotes the total number of spikes within total duration

*T*. The numerator denotes a 3D conditional pdf of spatial position and history condition on neuronal spiking, and the denominator denotes a 3D joint pdf of the spatial position and spike history. Note that the time and cell indices are dropped from

*x*,

*y*, and

*h*in these definitions as the densities are defined on space of

*X*,

*Y*, and

*H*, which take any particular value

_{c}*x*,

_{t}*y*, and for any given cell

_{t}*c*at any given time

*t*. The use of the logarithm in the history function allows for a smoother dependence of on , which in turn captures high-frequency components in the CIF due to refractoriness (i.e., a sharp decrease in right after emitting a spike) and bursting.

*h*as knowing . This assumption is reasonable as spiking history

*h*depends on the spatial coordinates through the spiking patterns of the neuron, which are characterized by . This assumption reduces a 3D into a 2D density estimation problem. The validity of this assumption is tested by performing goodness-of-fit tests and decoding analysis on the held-out test data. A direct way to verify the validity of this assumption is to calculate the dependencies between

*H*and and

_{c}*H*and . (For details, see Agarwal, Sacre, & Sarma, 2015.)

_{c}To estimate history-independent encoding models, we set .

*Decoding.*Assuming a change of variable: , instead of using equation 2.3, we use an equivalent yet simpler form of the MAP estimator: which yields the identical estimator due to rules of probability: , where (see equation B.3 in appendix B) has the same dependence on . To simplify equation 2.14, we assume that is a sufficient statistic (Lehmann & Scheffé, 1950) for

*h*for the

_{c}*c*th neuron. That is, knowing () does not convey any additional information about

*h*if is known. This assumption leads to Now, the density ( has been dropped from for brevity of notation) is estimated as where is a one-to-one nonlinear transformation. The second equality is derived from rules of probability (see equations B.4 and B.5 in appendix B). The densities , , and are estimated nonparametrically, as described in the next section.

_{c}For spike history–independent decoding models, the in equation 2.3 is first replaced by using the sufficient statistic assumption. Then is calculated using pseudo-equilibrium assumption (as done in GLM decoding): .

The MAP estimators () are computed by maximizing equation 2.3 or 2.14 over a 2D grid of coordinate space. These MAP estimators are then smoothed by filtering bidirectionally by a two-pole Butterworth filter with cutoff frequency Hz (for containing 95% power of true coordinates) to obtain the final smooth estimate . Figure 3 provides a summary of the encoding and decoding processes.

*Model estimation and validation.* Note that the rate functions and appear as products and ratios of densities. These densities are estimated using both second-order kernel density estimators () (Silverman, 1986) and our recently developed nonparametric band-limited ML estimator (Agarwal, Chen et al., 2015) (see also appendix A). Specifically, the two nonparametric density estimators are defined as follows:

- •Kernel density estimators (): assumes that the pdf is a linear combination of kernel functions centered at the data samples as follows: Here, the vectors are the data samples, and is a
*d*-dimensional gaussian kernel with bandwidth vector . We have used the isotropic kernel, , with an optimal bandwidth (Silverman, 1986), where*constant*s are determined by a cross-validation procedure._{i} - •Band-limited maximum likelihood () estimation, which assumes that the pdf square root is band-limited and smooth (Agarwal, Chen et al., 2015). The estimator has the following form: Here, the vectors are the data samples, denotes the vector of cutoff frequencies, , and the vector is computed by solving the following optimization problem (see appendix A for details): where, . Note that appears similar to , but it is different in the following ways: (1) is a nonparametric ML estimator, while is not optimal in any simple statistical sense; (2) the summation in the estimator is squared; and (3) the coefficients in front of the sinc functions in the estimator are computed by solving an optimization problem that falls out of the ML procedure. The estimator is computed using the algorithm (Agarwal, Chen et al., 2015) defined in appendix A.

For simplicity, despite the shape differences in place receptive fields, we assume that all neuronal place receptive fields have the same level of smoothness in the *x* and *y* directions across the entire population. Namely, we use the one-vector of bandwidths (for the estimator) and cutoff frequencies (for the estimator) for the entire neuronal population. To select these hyperparameters of bandwidths and cutoff frequencies, performance as measured by goodness-of-fit on validation data was evaluated as in Kloosterman et al. (2014). Specifically, the combination of bandwidths and cutoff frequencies that produces the best decoding performance on test data (as measured by *R*^{2} statistic) is selected. As the optimization on cutoff frequency is a 3D problem, a hill-climbing strategy (Yuret & De La Maza, 1993) (as done for parametric models) is used to find cutoff frequencies and bandwidths that jointly maximize decoding performance locally. Particularly, the first parameter is varied while fixing the second and third parameter to some initial values. A maximum decoding performance is obtained in the first dimension, and the first parameter is then fixed to the corresponding value. The second and third parameters are then varied, and their respective values that provide maximum decoding performance are obtained. The process is repeated until a local maximum in all dimensions is found: namely, no improvement in decoding performance is observed by varying any of the three parameters in any direction.

Similar to GLM estimation, out of 30.36 minutes of training data, the first 85% is used for model building and the remaining 15% to validate hyperparameters. Once the hyperparameters are validated, the model is retrained on the entire 30.36 minutes of training data and assessed on the new 6 minute test data.

### 2.4 Performance Evaluation

#### 2.4.1 Encoding Models

The goodness-of-fit of the eight estimators ( with and without spike history) of is assessed by using the time-rescaling theorem and the Kolmogorov-Smirnov (KS) statistic (Brown et al., 2002) on 6 minutes of test data. Particularly, we tested if rescaled times are distributed as a Poisson process with rate 1. The similarity between the two cumulative distribution functions is quantified using the normalized (by , number of spikes) KS statistic. Due to normalization, a value of KS 1 indicates that the estimated is too extreme (, one-sample KS test) to generate the test data. The closer the normalized KS statistic is to 0, the better is the estimate.

#### 2.4.2 Decoding Models

The decoding model performance is assessed by using the *R*^{2} statistic of estimates obtained from 6 minutes of test data. In addition, box plots with median, 5%, 25%, 75%, and 95% quantiles of the decoding errors for all the eight models are presented.

All parametric and nonparametric methods are implemented in customized software written in Matlab2014a (MathWorks, Natick, MA).

## 3 Results

### 3.1 Hyperparameter Validation

The validated hyperparameters of parametric and nonparametric models are summarized in Table 1. Note that spike history–independent parametric models contain no hyperparameter.

### 3.2 Encoding: CIF Estimation

Figure 4 shows the estimated CIFs using GLM methods for a unimodal and multimodal place cell, respectively. In Figure 4A, the method with and without history are compared. As can be seen in Figure 4A(ii,iii), both methods capture the position dependence. However, as shown in Figure 4A(v), the estimated using with no spike history dependence () fails to capture bursting and refractoriness phenomena, as evidenced in the KS statistic (2.01 versus 1.34 with spike history dependence) and negative log likelihood (3 versus 2.28 with spike history dependence) (see Figure 4A(vi)).

In Figure 4B, the performance of and (both with history dependence) on a multimodal place cell data is shown. It can be seen that method fails to capture the spatial complexity in firing patterns of the multimodal place cell. Although, did better in capturing the spatial complexity, KS = 2.2 implies scope for improvement in model fitting (see Figure 4B(vi)).

The encoding results are summarized in Table 2. In contrast to the GLM, the nonparametric encoding models yield better performance for the multimodal place cell data for the same multimodal cell. Figure 5B shows that the KS statistics for both (KS = 0.58) and (KS = 0.84) are lower than 1. Similarly, the negative log-likelihood values (0.85 for and 0.98 for ) are better than those of (1.83) and (1.43). The estimates for using and are shown in Figures 5C and 5E, respectively. First, it can be seen that for both place and grid cells, is larger wherever the neuron spikes more, verifying the “place-like” and “grid-like” structures. Second, the method produces a smoother than the method. Figures 5D and 5F show for the and methods, respectively.

Model . | Median KS Statistic . | Median Negative Log Likelihood . | Median Decoding Error (cm) . | R^{2}
. |
---|---|---|---|---|

Unimodal cells | ||||

(w/o, w/ history) | 4.33, 2.04 | 3.00, 2.36 | 23.59, 24.01 | 0.40, 0.44 |

(w/o, w/ history) | 3.80, 1.88 | 2.90, 2.28 | 24.21, 28.25 | 0.27, 0.32 |

(w/o, w/ history) | 4.41, 0.99 | 2.91, 2.29 | 25.67, 15.39 | 0.40, 0.80 |

(w/o, w/ history) | 4.04, 1.01 | 2.94, 1.82 | 22.29, 15.51 | 0.55, 0.83 |

Multimodal | ||||

(w/o, w/ history) | 5.74, 2.50 | 3.00, 2.00 | 29.70, 30.81 | 0.36, 0.34 |

(w/o, w/ history) | 4.65, 2.25 | 2.72, 1.83 | 21.58, 23.62 | 0.44, 0.55 |

(w/o, w/ history) | 4.86, 1.63 | 2.70, 2.07 | 19.07, 14.12 | 0.64, 0.83 |

(w/o, w/ history) | 4.65, 1.30 | 2.66, 1.22 | 19.99, 14.42 | 0.62, 0.83 |

All combined | ||||

(w/o, w/ history) | 4.82, 2.36 | 3.00, 2.15 | 24.24, 23.63 | 0.47, 0.50 |

(w/o, w/ history) | 4.27, 2.20 | 2.82, 1.94 | 20.04, 23.40 | 0.47, 0.59 |

(w/o, w/ history) | 4.58, 1.35 | 2.80, 2.25 | 21.75, 12.85 | 0.64, 0.88 |

(w/o, w/ history) | 4.25, 1.09 | 2.79, 1.52 | 18.21, 12.39 | 0.72, 0.89 |

Model . | Median KS Statistic . | Median Negative Log Likelihood . | Median Decoding Error (cm) . | R^{2}
. |
---|---|---|---|---|

Unimodal cells | ||||

(w/o, w/ history) | 4.33, 2.04 | 3.00, 2.36 | 23.59, 24.01 | 0.40, 0.44 |

(w/o, w/ history) | 3.80, 1.88 | 2.90, 2.28 | 24.21, 28.25 | 0.27, 0.32 |

(w/o, w/ history) | 4.41, 0.99 | 2.91, 2.29 | 25.67, 15.39 | 0.40, 0.80 |

(w/o, w/ history) | 4.04, 1.01 | 2.94, 1.82 | 22.29, 15.51 | 0.55, 0.83 |

Multimodal | ||||

(w/o, w/ history) | 5.74, 2.50 | 3.00, 2.00 | 29.70, 30.81 | 0.36, 0.34 |

(w/o, w/ history) | 4.65, 2.25 | 2.72, 1.83 | 21.58, 23.62 | 0.44, 0.55 |

(w/o, w/ history) | 4.86, 1.63 | 2.70, 2.07 | 19.07, 14.12 | 0.64, 0.83 |

(w/o, w/ history) | 4.65, 1.30 | 2.66, 1.22 | 19.99, 14.42 | 0.62, 0.83 |

All combined | ||||

(w/o, w/ history) | 4.82, 2.36 | 3.00, 2.15 | 24.24, 23.63 | 0.47, 0.50 |

(w/o, w/ history) | 4.27, 2.20 | 2.82, 1.94 | 20.04, 23.40 | 0.47, 0.59 |

(w/o, w/ history) | 4.58, 1.35 | 2.80, 2.25 | 21.75, 12.85 | 0.64, 0.88 |

(w/o, w/ history) | 4.25, 1.09 | 2.79, 1.52 | 18.21, 12.39 | 0.72, 0.89 |

Note: The best performance in each column is shown in bold.

Figure 5G shows the estimated CIF over time from the nonparametric and methods. Both methods are successful in capturing refractoriness and bursting in the neuronal activity. Specifically, refractoriness is captured by smaller values near across all , and bursting is captured by sudden large values near ms across all (see Figures 5D and 5F). The values achieved at and smaller are higher than the values achieved at ms and larger . This suggests that even if is small, can still reach a significant value if a spike occurred in the past 4 ms. This in turn makes the values of after 4 ms of any spike almost constant, which is therefore independent of . These findings confirm that bursting in place and grid cells is an internal phenomenon that is not affected much by the external stimulus.

Figure 6 shows the recorded and simulated activity (using the model) for a multimodal place cell shown in Figure 5. It can be seen that at both macro- and microlevels, the nonparametric model is able to reproduce the recorded activity; the spatial dependence is nicely captured as shown in Figure 6B, along with microlevel bursting, as shown in Figure 6D. Further, it can be seen in the zoom-in view in Figure 6C that there is high variability (Fenton & Muller, 1998; Moser et al., 2008) between the spike counts on different trajectories that pass through the region, even though the is almost constant over this small region. This effect is replicated in simulated activity due to incorporation of spike history. Statistical dependencies on the spike history, particularly bursting, makes the neuron either spike at least three of four times if it fires once or the neuron does not fire at all, resulting in increased variance.

For the entire population of unimodal place, cells, multimodal place, and grid cells, models with spike history are significantly better as measured by KS statistics and predictive log likelihoods than the models without spike history for (KS statistic, *P* = 2.5e-10, log likelihood *P* = 2.39e-10, paired Wilcoxon signed-rank test), (KS statistic *P* = 2.4e-10, log likelihood *P* = 1.94e-05), (KS statistic *P* = 2.4e-10, log likelihood *P* = 3.57e-08), and (KS statistic *P* = 2.5e-10, log likelihood *P* = 2.39e-10). Further, methods with spike history (median KS statistics = 1.09, median negative log likelihood = 1.52) perform significantly better than the KDE models with spike history (median KS statistics 1.35, negative log likelihood 2.25), with spike history (median KS statistics 2.36, negative log likelihood 2.15), and with spike history (median KS statistics 2.2, negative log likelihood 1.94) methods (respective *P*-values for KS-statistic and negative log likelihood: *P* = 2.19e-05 and 2.7e-10, *P* = 1.03e-09 and 2.5e-10, *P* = 1.03e-09 and 4.7e-10, paired Wilcoxon signed-rank test). Box plots showing the KS statistic and negative log likelihood values for all eight statistical models are shown in Figures 7 and 8. As expected, the improvement in CIF estimation using nonparametric methods is greater for the multimodal place cell and grid cell populations (the gap between the median negative log likelihood is between and with spike history) as compared to unimodal place cell population (the gap between the median negative log likelihood is between and with spike history).

### 3.3 Decoding

The decoding results are summarized in Table 2. In addition, Figure 9 shows a summary of decoding errors obtained using 6 minutes of test data, over the population of unimodal place cells, multimodal place, and grid cells and both using , , , and with and without spike history covariates. Overall, models with spike history (respective *R*^{2} statistics: 0.89, 0.88, 0.50, 0.59; respective median decoding error: 12.39, 12.85, 23.62, 23.40 cm) perform significantly better than their counterparts without spike history (respective *R*^{2}: 0.72, 0.64, 0.47, 0.47; respective median decoding error: 18.21, 21.75, 24.24, 20.04 cm; respective *P*-values of signed-rank test: 9.1e-97, 1.7e-135, 7.4e-07, 1.6e-05). Note that although the *R*^{2} value is better for with spike history (0.59 versus 0.47), the median decoding error is slightly worse (23.4 versus 20.04 cm). More important, with spike history performs the best among all four methods (respective *P*-values of signed-rank test: 0.02, 5.7e-165, and 4.0e-203 for , , and in decoding error). Similarly, without spike history also performs significantly better than , , and in decoding errors (respective *P*-values: 8.6e-27, 9.9e-54, and 1.01e-16). From Table 2, it can be seen that with or without spike history also performs the best in decoding the rat’s position only using unimodal cells or multimodal cells. Interestingly, has better performance than in *R*^{2} for unimodal cells, yet worse performance for multimodal cells.

As an illustration, Figure 10 shows the reconstructed rat’s trajectory calculated using and with spike history, respectively, for a 1 minute period in the test data. The animal’s actual position (black) is overlaid on the estimates (green; red), and each subpanel shows the reconstruction for a 20 second period. A video clip for demonstration of our decoding performance in experimental data (using ) is provided in the supplementary material.

## 4 Discussion and Conclusion

### 4.1 Nonparametric Modeling: Flexibility and Robustness

In this letter, we develop a novel nonparametric approach to study encoding properties and decoding performance in hippocampal place cells and entorhinal grid cells. Our approach is inspired by the nonparametric encoding framework previously proposed in Chen et al. (2012) and Kloosterman et al. (2014). The basic idea of the nonparametric encoding framework is to convert the encoding or decoding problem to a density estimation problem. In this letter, two nonparametric density estimators are constructed: the estimator Agarwal, Chen et al. (2015) and the standard . The performance of these two nonparametric models is then compared to parametric GLM models (using either quadratic or Zernike polynomials). The estimator outperforms both parametric methods. The model has the best performance for encoding models, whereas methods are comparable to in decoding performance. Both and methods outperformed parametric models in both encoding and decoding performances.

The proposed neural encoding models also capture the statistical dependence of spike history on current spiking activity. This allows modeling of neuronal spiking at a millisecond timescale, and, hence, both bursting and refractoriness phenomenon are captured successfully. Further, our result can also explain the observed extra variance in spiking activity, which cannot be accounted sufficiently by an inhomogeneous Poisson process (Fenton & Muller, 1998; Moser et al., 2008). This improved the decoding performance of our methods over state-of-the-art decoding methods that ignore spike history dependence and hence assume incorrectly that neuronal spiking is an inhomogeneous Poisson process phenomenon.

Due to the nonparametric nature of the estimator, the same model structure (with the same cutoff frequency) is able to characterize a variety of receptive field types observed in both place and grid cells. This modeling flexibility is an important step forward towards a practical solution of neural decoding, required in brain-machine interfaces. Currently, neural decoding via implantable devices suffers from random drifts in recordings and neuronal firing over time. Such drifts may include changes in firing patterns of the cell itself due to change in neural structure or environment (Moser et al., 2008; Frank, Stanley, & Brown, 2004; Muller & Kubie, 1987). Therefore, to achieve robust decoding, the encoding models should be flexible so that they can model a variety of receptive fields. As shown in this letter, the nonparametric method constructed by the estimator is statistically and computationally efficient, flexible enough to characterize a variety of receptive field types in the encoding phase, and may remain robust if the receptive field structure changes over time.

More important, our neural decoding results are the first attempt to decode the rat’s trajectory in a 2D space using multimodal place receptive fields (from either place cells or grid cells) together with spike history. The popular parametric GLM framework is insufficient for modeling such multimodal place and grid cells with or without spike history (Barbieri et al., 2004; Brown et al., 2002). Previous attempts using nonparametric models mainly concentrated on decoding from unsorted multiunit activity, and the decoding was achieved in a 1D environment (Kloosterman et al., 2014) without using information stored in spiking history.

### 4.2 Fourier Representation in Place Receptive Fields

Recently, researchers (Kubie & Fox, 2015; Ormond & McNaughton, 2015) have hypothesized that the complexity in hippocampal place receptive fields is formed as a result of a Fourier-like summation of multiple receptive fields of grid cells (see Figure 11), and the grid cells’ receptive field can be closely approximated by a 2D cosine function with a specific spatial frequency. These grid cells are generally arranged in anatomical regions called modules (see Figure 11A) that contain the grid cells with similar spatial frequencies (Stensola et al., 2012). The hippocampal place cells receive projections from grid cells across multiple modules (each with distinct spatial frequencies). A toy example of such a sum is shown in Figure 11B. This hypothesis is called the Fourier hypothesis. If the Fourier hypothesis is biologically plausible, then it will also explain the superior performance of the nonparametric estimator, which assumes that place and grid fields are supported by sinusoidal functions as in the Fourier representation. However, the superior performance of the estimator in this investigation may also be due to superiority of density estimators. It remains to be tested which of the aforementioned is true.

### 4.3 Extension and Future Work

Several extensions of our work can be considered in future investigations. First, in neural encoding, we can in principle incorporate other informative neural covariates, such as the power or phase of the local field potential (Agarwal & Sarma, 2010, 2012a; Agarwal et al., 2014). The expectation is that, this will further increase the performance of the encoding model. In addition, finding an efficient procedure for bandwidth parameter selection is important to obtain good performance. Details on this investigation can be found in our previous work (Agarwal, Chen et al., 2015). Second, we can formulate the neural decoding problem within a state-space framework (Brown et al., 1998; Chen, 2013, 2015), by considering a smooth temporal prior. This will further improve the decoding accuracy. Finally, our methodology is not limited to rodent hippocampal place cells or grid cells; it can be extended to other modalities, such as the retina, lateral geniculate nucleus, primary visual cortex, and primary motor cortex.

## Appendix A: Algorithm for Estimator

To make this letter self-contained, we present a short introduction of the estimator and algorithm. The detailed results are provided in (Agarwal, Chen et al., 2015).

### A.1 Formulation of the Estimator

*f*denotes the corresponding cutoff frequency in unit Hz). Then, Define, and where . Note that and are Hilbert spaces with the inner product defined as , , respectively. The norm is defined for both spaces. Further, note that for all elements in and , .

_{c}### A.2 The Likelihood Function for Band-Limited PDFs

*n*independent realizations . The likelihood of observing is Defining yields Further, consider that maximizes the likelihood function (see equation A.6): Then the estimator is

### A.3 Choice of Optimal Orthant

Agarwal, Chen et al. (2015), showed that the asymptotic solution of equation A.10 lies in the orthant with indicator vector if is band-limited and (i.e., strictly positive ). Therefore, the optimal orthant vector can be chosen trivially (i.e., ), and then is solved in that orthant to compute . It is important to note that the asymptotic properties start to manifest for sample sizes as far as , and the condition is obeyed by many pdfs encountered in nature. Therefore, the two conditions are very less restrictive.

### A.4 Algorithm

*f*is chosen such that the will also converge with a rate of . This happens at also if .

_{s}### A.5 Implementation and Computational Complexity

First consider the following theorem (Agarwal, Chen et al., 2015).

Consider *n* i.i.d observations of random variable *x* with pdf that has a tail. Then for large *n*.

Let denote the index for samples and denote the index for bins. The implementation of the algorithm consists of the following steps:

Compute from . This step has a computational complexity of .

Sort and construct () and . Note that is a block-Toeplitz matrix (Toeplitz arrangements of blocks and each block is Toeplitz). This step has a computational complexity of (Akaike, 1973).

- Use convex optimization algorithms to solve . Newton’s method should take a finite number of iterations to reach a given tolerance since the cost function is self-concordant (Boyd & Vandenverghe, 2004). Therefore, the computational complexity of optimization is the same as the computational complexity of one iteration. The complexity of one iteration is the same as the complexity of calculating As the matrix has a block-Toeplitz structure, the Akaike algorithm (Akaike, 1973) can be used to evaluate each iteration of Newton’s method in . Note that Simulations show that can be approximated accurately (to machine accuracy) by a low-rank matrix (e.g., with a rank order for ); therefore, the inversion can be performed in . Further, in some cases, one may end up with a large
*B*(e.g., if the true pdf has heavy tails) so that storing the Hessian matrix becomes expensive. In such cases, a quasi-Newton or gradient-descent optimization algorithm can be used that computes the estimator fairly quickly. Evaluate the estimate at

*l*given points. This step has a computational complexity of .

In summary, the total computational complexity of is . Substituting (from theorem ^{2}), yields the total computational complexity , where describes the tail of the pdf (the gaussian pdf has an infinite value of *r*) and *l* is the number of points being evaluated.

## Appendix B: Change of Variables

Let , and , where is a one-to-one nonlinear function with inverse image .

## Acknowledgments

Z.C. was supported by an NSF-CRCNS award (no. 1307645) from the U.S. National Science Foundation. M.A.W. is supported by NIH grant R01-MH06197, TR01-GM10498, and ONR-MURI grant N00014-10-1-0936. S.V.S. was supported by the U.S. National Science Foundation CAREER Award 1055560, the Burroughs Wellcome Fund CASI Award 1007274, and NSF EFRI-M3C.

## References

*Nonparametric estimation of band-limited probability density functions*