Abstract

Neural decoding may be formulated as dynamic state estimation (filtering) based on point-process observations, a generally intractable problem. Numerical sampling techniques are often practically useful for the decoding of real neural data. However, they are less useful as theoretical tools for modeling and understanding sensory neural systems, since they lead to limited conceptual insight into optimal encoding and decoding strategies. We consider sensory neural populations characterized by a distribution over neuron parameters. We develop an analytically tractable Bayesian approximation to optimal filtering based on the observation of spiking activity that greatly facilitates the analysis of optimal encoding in situations deviating from common assumptions of uniform coding. Continuous distributions are used to approximate large populations with few parameters, resulting in a filter whose complexity does not grow with population size and allowing optimization of population parameters rather than individual tuning functions. Numerical comparison with particle filtering demonstrates the quality of the approximation. The analytic framework leads to insights that are difficult to obtain from numerical algorithms and is consistent with biological observations about the distribution of sensory cells' preferred stimuli.

1  Introduction

Populations of sensory neurons encode information about the external world through their spiking activity. To understand this encoding, it is natural to model it as an optimal or near-optimal code in the context of some task performed by higher brain regions, using performance criteria such as decoding error or motor performance. A Bayesian theory of neural decoding is useful to characterize optimal encoding, as the computation of performance criteria typically involves the posterior distribution of the world state conditioned on spiking activity.

We model the external world state as a random process, observed through a set of sensory neuron-like elements characterized by multidimensional tuning functions, representing the elements' average firing rate (see Figure 1). The actual firing of each cell is random and is given by a point process (PP) with rate determined by the external state and the cell's tuning function (Dayan & Abbott, 2005). Under this model, decoding of sensory spike trains may be formulated as a filtering problem based on PP observations, thus falling within the purview of nonlinear filtering theory (Snyder & Miller, 1991; Brémaud, 1981). Inferring the hidden state under such circumstances has been widely studied within the computational neuroscience literature (Dayan & Abbott, 2005; Macke, Buesing, & Sahani, 2015). Beyond neuroscience, PP-based filtering has been used for position sensing and tracking in optical communication (Snyder, Rhodes, & Hoversten, 1977), control of computer communication networks (Segall, 1978), queuing (Brémaud, 1981), and econometrics (Frey & Runggaldier, 2001).

Figure 1:

Problem setting.

Figure 1:

Problem setting.

A significant amount of work has been devoted in recent years to the development of algorithms for fast approximation of the posterior distribution, leading to an extensive literature (see Macke et al., 2015, for a recent review). Much of this work is devoted to the development of effective sampling techniques, leading to highly performing finite-dimensional filters that can be applied profitably to real neural data. These approaches are usually formulated in discrete time, as befits implementation on digital computers, and lead to complex mathematical expressions for the posterior distributions, which are difficult to interpret qualitatively. In this work, we are less concerned with algorithmic issues and more with establishing closed-form analytic expressions for approximately optimal continuous time filters and using these to characterize the nature of near-optimal encoders, namely, determining the structure and distribution of tuning functions for optimal state inference. A significant advantage of the closed-form expressions over purely numerical techniques is the insight and intuition that is gained from them about qualitative aspects of the system. Moreover, the leverage gained by the analytic computation contributes to reducing the variance inherent in Monte Carlo approaches. Thus, in this work, we do not compare our results to algorithmically oriented discrete-time filters but, rather, to other continuous-time analytically expressible filters for dynamically varying signals, with the aim of gaining insight into optimal decoding and encoding within an analytic framework.

The problem of filtering a continuous-time diffusion process through PP observations is solved formally under general conditions in Snyder (1972; see also Segall, 1976, and Solo, 2000), where a stochastic partial differential equation (PDE) for the infinite-dimensional posterior state distribution is derived. However, this PDE is intractable in general and not easily amenable to qualitative or even numerical analysis. Some previous work has derived exact or approximate finite-dimensional filters under various simplifying assumptions. In many of these works (e.g., Komaee, 2010; Rhodes & Snyder, 1977; Susemihl, Meir, & Opper, 2013; Twum-Danso & Brockett, 2001; Yaeli & Meir, 2010), the tuning functions are chosen so that the total firing rate (i.e., the sum of firing rates of all neurons) is independent of the state, an assumption we refer to as uniform coding (see Figure 2). Rhodes and Snyder (1977) derived an exact finite-dimensional filter for the case of linear dynamics with gaussian noise observed through uniform coding with gaussian tuning functions.1 Komaee (2010) considered the more general setting of uniform coding with arbitrary tuning functions, where an approximate filter is obtained.

Figure 2:

The uniform coding property, $Σiλix=const$, holds approximately in homogeneous populations with tuning function centers located on a dense uniform grid, as demonstrated in (a) a one-dimensional case. (b) Nonuniform coding.

Figure 2:

The uniform coding property, $Σiλix=const$, holds approximately in homogeneous populations with tuning function centers located on a dense uniform grid, as demonstrated in (a) a one-dimensional case. (b) Nonuniform coding.

Other work derives the posterior distribution for non-Markovian state dynamics, modeled as a gaussian processes. Huys, Zemel, Natarajan, and Dayan (2007) derive the posterior exactly, but its computation is not recursive, requiring memory of the entire spike history. A recursive version for gaussian processes with a Matérn kernel autocorrelation is derived in Susemihl, Meir, and Opper (2011). Both of these works assume uniform coding with gaussian tuning functions.

For reasons of mathematical tractability, few previous analytically oriented works studied neural decoding without the uniform coding assumption in spite of the experimental importance and relevance of nonuniform coding. We discuss such work in comparison to our work in section 6.

The problem of optimal encoding by neural populations has been studied mostly in the static case. A natural optimality criterion is the estimation mean square error (MSE). Some work (e.g., Harper & McAlpine, 2004; Ganguli & Simoncelli, 2014) optimizes Fisher information, which serves as a proxy to the MSE of unbiased estimators through the Cramér-Rao bound (Radhakrishna Rao, 1945) or, in the Bayesian setting, the Van Trees inequality (Gill & Levit, 1995). Fisher information of neural spiking activity is easy to compute analytically, at least in the static case (Dayan & Abbott, 2005), and it can be used without solving the decoding problem. This approach has been used to study nonuniform coding of static stimuli by heterogeneous populations in many works, including Chelaru and Dragoi (2008), Ecker, Berens, Tolias, and Bethge (2011), Ganguli and Simoncelli (2014). However, optimizing Fisher information may yield misleading qualitative results regarding the MSE-optimal encoding (Bethge, Rotermund, & Pawelzik, 2002; Pilarski & Pokora, 2015; Yaeli & Meir, 2010). Although under appropriate conditions, the inverse of Fisher information approaches the minimum attainable MSE in the limit of infinite decoding time, it may be a poor proxy for the MSE for finite decoding times, which are of particular importance in natural settings and control problems. Exact computation of the estimation MSE is possible in some restricted settings; some work along those lines is discussed in section 6.2.

A possible alternative is the computation of estimation MSE for a given filter through Monte Carlo simulations. This approach is complicated by high variability between trials, which means many trials are necessary for each value of the parameters. Consequently, optimization becomes very time-consuming, and possibly impractical, when using numerical filters such as particle filtering or large neural populations with many parameters.

In this work, we derive an approximate online filter for the neural decoding problem in continuous time and demonstrate its use in investigating optimal neural encoding. We consider neural populations characterized by a distribution over neuron parameters. Continuous distributions are used to approximate large populations with few parameters, resulting in a filter whose complexity does not grow with the population size and allowing optimization of population parameters rather than individual tuning functions. We suggest further reducing computational complexity for the encoding problem by using the estimated posterior variance as an approximation to estimation MSE, as discussed in appendix C.

Technically, given the intractable infinite-dimensional nature of the posterior distribution, we use a projection method replacing the full posterior at each point in time by a projection onto a simple family of distributions (gaussian in our case). This approach, originally developed in the filtering literature (Brigo, Hanzon, Le Gland, 1999; Maybeck, 1979) and termed assumed density filtering (ADF), has been successfully used more recently in machine learning (Opper, 1998; Minka, 2001). We derive approximate filters for gaussian tuning functions and for several distributions over tuning function centers, including the case of a finite population. These filters may be combined to obtain a filter for heterogeneous mixtures of homogeneous subpopulations. We are not aware of any previous work providing an effective closed-form filter for heterogeneous populations of sensory neurons characterized by a small number of parameters.

The main contributions of this letter are as follows:

• Derivation of closed-form recursive expressions for the continuous-time posterior mean and variance within the ADF approximation, in the context of large nonuniform populations characterized by a small number of parameters

• Demonstration of the quality of the ADF approximation by comparison to state-of-the-art particle filtering methods

• Characterization of optimal adaptation (encoding) for sensory cells in a more general setting than hitherto considered (nonuniform coding, dynamic signals)

• Demonstration of the interesting interplay between prior information and neuronal firing, showing how in certain situations, the absence of spikes can be highly informative (this phenomenon is absent under uniform coding).

Preliminary results discussed in this letter were presented at a conference (Harel, Meir, & Opper, 2015). These included only the special case of gaussian distribution of preferred stimuli. This letter provides a more general and rigorous formulation of the mathematical framework. By separately considering different terms in the approximate filter, we find that updates at spike time depend only on the tuning function of the spiking neuron and apply generally to any population distribution. For the updates between spikes, we provide closed-form expressions for cases that were not discussed in Harel et al. (2015)—namely, uniform populations on an interval and finite heterogeneous mixtures—and non-closed-form expressions for the general case, given as integrals involving the distribution of neuron parameters. We further supplement our previously published results with numerical evaluation of the filter's accuracy, an additional example application, and a detailed comparison with previous work.

2  Problem Overview

Consider a dynamical system with state $Xt∈Rn$, observed through the firing patterns of $M$ sensory neurons, as illustrated in Figure 1. Each neuron fires stochastically and independently, with the $i$th neuron having firing rate $λiXt$. (More detailed assumptions about the dynamics of the state and observation processes are described in later sections.) In this context, we are interested in the question of optimal encoding and decoding. By decoding, we mean computing (exactly or approximately) the full posterior distribution of $Xt$ given $Nt$, which is the history of neural spikes up to time $t$. The problem of optimal encoding is then the problem of optimal sensory cell configuration—finding the optimal rate function $λi·i=1M$ so as to minimize some performance criterion. We assume the set $λii=1M$ belongs to some parameterized family with parameter $ϕ$.

To quantify the performance of the encoding-decoding system, we summarize the result of decoding using a single estimator $X^t=X^tNt$, and define the mean square error (MSE) as $∊t≜trace[(Xt-X^t)(Xt-X^t)T]$. We seek $X^t$ and $ϕ$ that solve $minϕminX^tE∊t=minϕE[minX^tE[∊t|Nt]]$. The inner minimization problem in this equation is solved by the MSE-optimal decoder, which is the posterior mean $X^t=μt≜EXt|Nt$. The posterior mean may be computed from the full posterior obtained by decoding. The outer minimization problem is solved by the optimal encoder. If decoding is exact, the problem of optimal encoding becomes that of minimizing the expected posterior variance. Note that although we assume a fixed parameter $ϕ$ that does not depend on time, the optimal value of $ϕ$ for which the minimum is obtained generally depends on the time $t$ where the error is to be minimized. In principle, the encoding/decoding problem can be solved for any value of $t$. In order to assess performance, it is convenient to consider the steady-state limit $t→∞$ for the encoding problem.

Below, we approximately solve the decoding problem for any $t$. We then explore the problem of choosing the optimal encoding parameters $ϕ$ using Monte Carlo simulations in examples motivated by experimental results.

Having an efficient (closed-form) approximate filter allows performing the Monte Carlo simulation at a significantly reduced computational cost relative to numerical methods such as particle filtering. The computational cost is further reduced by averaging the computed posterior variance across trials rather than the squared error, thereby requiring fewer trials. The mean of the posterior variance equals the MSE (of the posterior mean) but has the advantage of being less noisy than the squared error itself, since by definition, it is the mean of the square error under conditioning on $Nt$.

3  Decoding

3.1  A Finite Population of Gaussian Neurons

For ease of exposition, we first formulate the problem for a finite population of neurons. We address a more general setting in subsequent sections.

3.1.1  State and Observation Model

The observed process $X$ is a diffusion process obeying the stochastic differential equation (SDE),2
$dXt=AXtdt+DXtdWt,t≥0,$
(3.1)
where $A·,D·$ are arbitrary functions such that equation 3.1 has a unique solution3 and $Wt$ is a standard Wiener process whose increments are independent of the history of all other random processes. The integral with respect to $dWt$ is to be interpreted in the Ito sense. The initial condition $X0$ is assumed to have a continuous distribution with a known density.
The observation processes are ${Ni}i=1M$, where $Nti$ is the spike count of the $i$th neuron up to time $t$. Denote by $Nt=(N0,ti)i=1M$ the history of neural spikes up to time $t$ and by $Nt=∑i=1MNti$ the total number of spikes up to time $t$ from all neurons. We assume that the $i$th neuron fires with rate $λiXt$ at time $t$, independent of other neurons given the state history $X0,t$. More explicitly, this means
$PNt+hi-Nti=k|X0,t,Nt=λiXth+ohk=01-λiXth+ohk=1ohk>1,i=1,…M,h→0+$
(3.2)
where $X0,t$ denotes the history up to time $t$ of $X$ and $oh$ is little-o asymptotic notation, denoting any function satisfying $oh/h→0$ as $h→0+$. Thus, each $Ni$ is a doubly stochastic poisson process (DSPP; see, e.g., Snyder & Miller, 1991) with rate process $λiXt$.4
To achieve mathematical tractability, we assume that the tuning functions $λi$ are gaussian. The firing rate of the $i$th neuron in response to state $x$ is given by
$λix=hiexp-12Hix-θiRi2,$
(3.3)
where $θi∈Rn$ is the neuron's preferred location, $hi∈R+$ is the neuron's maximal expected firing rate, $Hi∈Rm×n$ and $Ri∈Rm×m$, $m≤n$, are fixed matrices, each $Ri$ is positive definite, and the notation $yM2$ denotes $yTMy$. The inclusion of the matrix $Hi$ allows using high-dimensional models where only some dimensions are observed, for example, when the full state includes velocities but only locations are directly observable. In typical applications, $Hi$ would be the same across all neurons or at least across all neurons of the same sensory modality.
In the sequel, we use the following standard notation,
$∫abhtdNti≜∑j1tji∈a,bhtji,$
(3.4)
for any function $h$, where $tji$ is the time of the $j$th point of the process $Ni$. This is the usual Lebesgue integral of $h$ with respect to $Ni$ viewed as a discrete measure.

3.1.2  Model Limitations

The model outlined above involves several simplifications to achieve tractability. Namely, tuning functions are assumed to be gaussian, and firing rates are assumed to be independent of state history and spike history given the current state (see equation 3.2), yielding a DSPP model (see note 4).

Gaussian tuning functions are a reasonable model for some neural systems but inadequate for others—for example, where the tuning is sigmoidal or there is a baseline firing rate regardless of stimulus value. For simplicity, we focus on the gaussian case in this work. It is straightforward to extend the derivation presented here to piecewise linear tuning, which may be used to represent sigmoidal tuning functions, but the resulting expression are more cumbersome. We have also developed closed-form results for tuning functions given by sums of gaussians; however, these require further approximations in order to obtain analytic results and are not discussed in this letter.

The assumption of history-independent rates may also limit the model's applicability. Real sensory neurons exhibit firing history dependence in the form of refractory periods and rate adaptation (Dayan & Abbott, 2005), state history dependence such as input integration (Dayan & Abbott, 2005), or correlations between the firing of different neurons conditioned on the state (Pillow et al., 2008). These phenomena are captured by some encoding models, such as simple integrate-and-fire models, as well as more complex physiological models like the Hodgkin-Huxley model. However, the simplifying independence assumptions above are common to all work presenting closed-form, continuous-time filters for point-process observations that we are aware of.

Note that characterization of the point processes in terms of their history-conditioned firing rate, as opposed to finite-dimensional distributions, does not in itself restrict the model's generality in any substantial way (see Segall & Kailath, 1975, theorem 1). Rather, the independence assumptions are expressed rigorously by the fact that the right-hand side of equation 3.2 depends on neither previous values of $X$ nor previous spike times of any neuron. Some of our analysis applies without modification when rates are allowed to depend on spiking history (specifically, equations 3.6), so it may be possible to extend these techniques to some history-dependent models. However, when rates may depend on the state history, exact filtering may involve the posterior distribution of the entire state history rather than the current state, so that a different approach is probably required.

3.1.3  Exact Filtering Equations

Let $ptN·$ be the posterior density of $Xt$ given the firing history $Nt$, and $EtN·$ the posterior expectation given $Nt$. The prior density $p0N$ is assumed to be known. We denote by $λ^ti$ the rate of $Nti$ with respect to the history of spiking only—the rates that would appear on the right-hand side of equation 3.2 if the conditioning on the left were only on $Nt$. These rates are given by5
$λ^ti=EtNλiXt=∫ptNxλixdx.$
The problem of filtering a diffusion process $X$ from a doubly stochastic Poisson process driven by $X$ is formally solved in Snyder (1972), where the authors derive a stochastic PDE for the posterior density6
$dptNx=L*ptNxdt+ptNx∑iλixλ^ti-1dNti-λ^tidt,$
(3.5)
where $L$ is the state's infinitesimal generator (Kolmogorov's backward operator), defined as $Lhx=limΔt→0+EhXt+Δt|Xt=x-hx/Δt$, $L*$ is $L$'s adjoint operator (Kolmogorov's forward operator). The notation $dNti$ is interpreted as in equation 3.4, so this term contributes a jump of size $ptNxλix/λ^ti-1$ at a spike of the $i$th neuron. Equation 3.5 may be written in a notation more familiar for nonstochastic PDEs using Dirac delta functions,
$∂∂tptNx=L*ptNx+ptNx∑iλixλ^ti-1N˙ti-λ^ti,$
where $N˙ti≜∑jδ(t-tji)$ is the spike train of the $i$th neuron, which is the formal derivative of the process $Ni$. An accessible, albeit nonrigorous, derivation of equation 3.5 via time discretization is found in Susemihl (2014, sect. 2.3).

The stochastic PDE, equation 3.5, is nonlinear and nonlocal (due to the dependence of $λ^ti$ on $ptN$) and therefore usually intractable. Rhodes and Snyder (1977) and Susemihl et al. (2014) consider linear dynamics with a gaussian prior and gaussian sensors with centers distributed uniformly over the state space. In this case, the posterior is gaussian, and equation 3.5 leads to closed-form ordinary differential equations (ODEs) for its mean and variance. In our more general setting, we can obtain exact equations for the posterior mean and variance, as follows.

Let $μt≜EtNXt,X~t≜Xt-μt,Σt≜EtN[X~tX~tT]$. Using equation 3.5, along with known results about the form of the infinitesimal generator $Lt$ for diffusion processes (e.g., Øksendal, 2003, theorem 7.3.3), the first two posterior moments can be shown to obey the following exact equations (see appendix A):
$dμt=EtNAXtdt+∑iEt-Nωt-iXt-dNti-λ^tidt,$
(3.6a)
$dΣt=EtNAXtX~t⊺+X~tAXt⊺+DXtDXt⊺dt+∑iEt-Nωt-iX~t-X~t-⊺dNti-λ^tidt-∑iEt-Nωt-iXt-Et-Nωt-iXt-⊺dNti,$
(3.6b)
where
$ωti≜λiXtλ^ti-1,$
and the expressions involving $t-$ denote left limits, which are necessary since the solutions to equation 3.6 are discontinuous at spike times.

In contrast with the more familiar case of linear dynamics with gaussian white noise and the corresponding Kalman-Bucy filter (Maybeck, 1979), here the posterior variance is random and is generally not monotonically decreasing even when estimating a constant state. However, noting that $E[dNti-λ^tidt]=0$, we may observe from equation 3.6b that for a constant state ($A=D=0$), the expected posterior variance $EΣt$ is decreasing, since the first two terms in equation 3.6b vanish.

We will find it useful to rewrite equation 3.6b in a different form, as follows,
$dμt=dμtπ+dμtc+dμtN,dΣt=dΣtπ+dΣtc+dΣtN,$
where $dμtπ,dΣtπ$ are the prior terms, corresponding to $L*ptNx$ in equation 3.5, and the remaining terms are divided into continuous update terms $dμtc,dΣtc$ (multiplying $dt$) and discontinuous update terms $dμtN,dΣtN$ (multiplying $dNti$). Using equation 3.6, we find the exact equations:
$dμtπ=EtNAXtdt,$
(3.7a)
$dΣtπ=EtNAXtX~t⊺+X~tAXt⊺+DXtDXt⊺dt,$
(3.7b)
$dμtc=-∑iEtNωtiXtλ^idt,$
(3.7c)
$dΣtc=-∑iEtNωtiX~tX~t⊺λ^idt,$
(3.7d)
$dμtN=∑iEt-Nωt-iXt-dNti,$
(3.7e)
$dΣtN=∑iEt-Nωt-iX~t-X~t-⊺-Et-Nωt-iXt-Et-Nωt-iXt-⊺dNti,$
(3.7f)
The prior terms $dμtπ,dΣtπ$ represent the known dynamics of $X$ and are the same terms appearing in the Kalman-Bucy filter. These would be the only terms left if no measurements were available and would vanish for a static state. The continuous update terms $dμtc,dΣtc$ represent updates to the posterior between spikes that are not derived from $X$'s dynamics and therefore may be interpreted as corresponding to information obtained from the absence of spikes. The discontinuous update terms $dμtN,dΣtN$ contribute a change to the posterior at spike times depending on the spike's origin $i$ and thus represent information obtained from the presence of a spike as well as the parameters of the spiking neuron.

Note that the gaussian tuning assumption 3.3 has not been used in this section, and equations 3.7 are valid for any form of $λi$.

While equations 3.7 are exact, they are not practical, since they require computation of posterior expectations $EtN·$. To bring them to a closed form, we use ADF with an assumed gaussian density (see Opper, 1998 for details). Informally, this may be envisioned as integrating equation 3.7, while replacing the distribution $ptN$ by its approximating gaussian “at each time step.” The approximating gaussian is obtained by matching the first two moments of $ptN$ (Opper, 1998). Note that the solution of the resulting equations does not in general match the first two moments of the exact solution, though it may approximate it. Practically, the ADF approximation amounts to substituting the normal distribution $N(μt,Σt)$ for $ptN$ to compute the expectations in equation 3.7. This heuristic may be justified by its relation to a projection method, where the right-hand side of the density PDE is projected onto the tangent space of the approximating family of densities: the two approaches are equivalent when the approximating family is exponential (Brigo et al., 1999).

If the dynamics are linear, the prior updates, equations 3.7a and 3.7b, are easily computed in closed form after this substitution. Specifically, for $dXt=AXtdt+DdWt$, the prior updates read
$dμtπ=Aμtdt,$
(3.8a)
$dΣtπ=AΣt+ΣtA+DD⊺dt,$
(3.8b)
as in the Kalman-Bucy filter (Maybeck, 1979). Otherwise, they may be approximated by expanding the nonlinear functions $A(x)$ and $DxDx⊺$ as power series and applying the assumed gaussian density, resulting in tractable gaussian integrals. The use of ADF in the prior terms is outside the scope of this work (see, e.g., Maybeck, 1979, chap. 12).

We therefore turn to the approximation of the nonprior updates, equations 3.7c to 3.7f, in the case of gaussian tuning, equation 3.3.

Abusing notation, from here on, we use $μt,Σt$, and $ptNx$ to refer to the ADF approximation rather than to the exact values. Applying the gaussian ADF approximation $ptNx≈Nx;μt,Σt$ in the case of gaussian tuning functions 3.3 yields the nonprior terms
$dμtc=∑iΣtHi⊺Stiδtiλ^tidt,$
(3.9a)
$dΣtc=∑iΣtHi⊺Sti-Stiδtiδti⊺StiHiΣtλ^tidt,$
(3.9b)
$dμtN=-∑iΣt-Hi⊺St-iδt-idNti,$
(3.9c)
$dΣtN=-∑iΣt-Hi⊺St-iHiΣt-dNti,$
(3.9d)
$λ^ti=hiStiRiexp-12δtiSti2,$
(3.9e)
where
$δti≜Hiμt-θi,Sti≜Ri-1+HΣtH⊺-1.$
(3.9f)
These equations are a special case of equation 3.16, which are derived in appendix A. The updates for the posterior precision $Σt-1$ have a simpler form, also derived in appendix A:
$dΣt-1,c=-∑iHi⊺Sti-Stiδtiδti⊺StiHiλ^tidt,$
(3.10a)
$dΣt-1,N=∑iHi⊺RiHidNti.$
(3.10b)
In the scalar case $m=n=1$, with $H=1$, $σt2=Σt,αi2=Ri-1$, the update equations 3.9 and 3.10 read
$dμtc=∑iσt2σt2+αi2μt-θiλ^tidNti,$
(3.11a)
$dσt2,c=∑iσt2σt2+αi21-μt-θi2σt2+αi2σt2λ^tidt,$
(3.11b)
$dσt-2,c=∑i1σt2+αi2μt-θi2σt2+αi2-1λ^tidt,$
(3.11c)
$dμtN=∑iσt-2σt-2+αi2θi-μt-dNti,$
(3.11d)
$dσt2,N=-∑iσt-2σt-2+αi2σt-2dNti,$
(3.11e)
$dσt-2,N=∑iαi-2dNti$
(3.11f)
$λ^ti=hi2παi2Nμt;θi,σt2+αi2.$
(3.11g)
Figure 3 illustrates the filter 3.11 in a one-dimensional example.
Figure 3:

Filtering example with two sensory neurons. A dynamic state $dXt=-Xtdt+dWt$ is observed by two gaussian neurons, equation 3.3, with preferred stimuli $θ1=-1.2,θ2=1.2$, and filtered according to equations 3.8 and 3.9. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus $θ$, equal to $±1.2$ in this case. The approximate posterior mean $μt$ is shown in blue, with the posterior standard deviation $σt$ in the shaded areas. The curves to the right of the graph show the tuning functions of the two neurons. The bottom graph shows the posterior precision $σt-2$. Neuron parameters are $θ1=-1.2,θ2=1.2,h1=10,h2=5,Hi=1,Ri-1=αi2=0.5i=1,2$. The process and the filter were both initialized from the steady-state distribution of the dynamics, which is $N0,0.5$. The dynamics were discretized with time step $Δt=10-3$.

Figure 3:

Filtering example with two sensory neurons. A dynamic state $dXt=-Xtdt+dWt$ is observed by two gaussian neurons, equation 3.3, with preferred stimuli $θ1=-1.2,θ2=1.2$, and filtered according to equations 3.8 and 3.9. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus $θ$, equal to $±1.2$ in this case. The approximate posterior mean $μt$ is shown in blue, with the posterior standard deviation $σt$ in the shaded areas. The curves to the right of the graph show the tuning functions of the two neurons. The bottom graph shows the posterior precision $σt-2$. Neuron parameters are $θ1=-1.2,θ2=1.2,h1=10,h2=5,Hi=1,Ri-1=αi2=0.5i=1,2$. The process and the filter were both initialized from the steady-state distribution of the dynamics, which is $N0,0.5$. The dynamics were discretized with time step $Δt=10-3$.

3.1.5  Interpretation

To gain some insight into the filtering equations, we consider the discontinuous updates, equations 3.9c and 3.9d, and continuous updates, equations 3.9a and 3.9b, in some special cases, in reference to the example presented in Figure 3.

Discontinuous updates. Consider the case $Hi=I$. As seen from the discontinuous update equations 3.9c and 3.9d, when the $i$th neuron spikes, the posterior mean moves toward its preferred location $θi$, and the posterior variance decreases (in the sense that $Σt+-Σt-$ is negative definite), as seen in Figure 3. Neither update depends on $hi$.

For general $Hi∈Rm×n$ of full row rank, let $Hir$ be any right inverse of $Hi$ and $θ‾i=Hirθi$. Note that $Hi$ projects the state $Xt$ to the coordinates employed by the $i$th neuron, which we refer to as perceptual coordinates; thus, $θ‾i$ may be interpreted as the tuning function center in state coordinates, whereas $θi$ is in the neuron's perceptual coordinates. We may rewrite equation 3.3 in state coordinates as
$λix=hiexp-12x-θ‾iHi⊺RiHi2.$
Now, the updates for a spike of neuron $i$ at time $t$ can be written more intuitively (see appendix A) as
$μt+=Σt--1+Hi⊺RiHi-1Σt--1μt-+Hi⊺Riθi=Σt--1+Hi⊺RiHi-1Σt--1μt-+Hi⊺RiHiθ‾i,Σt+-1=Σt--1+Hi⊺RiHi.$
(3.12)
Thus, the new posterior mean is a weighted average of the prespike posterior mean and the preferred stimulus in state coordinates. The posterior precision $Σt-1$ increases by $Hi⊺RiHi$, which is the tuning function precision matrix in state coordinates. This may be observed in Figure 3, where the posterior precision increases at each spike time by the fixed amount $Ri=αi-2=2$.
Continuous updates. The continuous mean update equation 3.9a, contributing between spiking events, also admits an intuitive interpretation in the case where all neurons share the same shape matrices $Hi=H,Ri=R$. In this case, the equation reads
$dμtc=ΣtH⊺R-1+HΣtH⊺-1Hμt-∑iθiνti∑iλ^tidt,$
where $νti≜λ^ti/∑jλ^tj$. The normalized rates $νti$ may be interpreted heuristically as the distribution of the next firing neuron's index $i$, provided the next spike occurs immediately (Brémaud, 1981, theorem T15). Thus, the absence of spikes drives the posterior mean away from the expected preferred stimulus of the next spiking neuron. The strength of this effect scales with $∑iλ^ti$, which is the total expected rate of spikes given the firing history. This behavior is qualitatively similar to the result obtained in Bobrowski, Meir, and Eldar (2009) for a finite population of neurons observing a continuous-time finite-state Markov process, where the posterior probability between spikes concentrates on states with lower total firing rate.

This behavior may be observed in Figure 3. When the posterior mean $μt$ is near a neuron's preferred stimulus, it moves away from it between spikes as the next spike is expected from that neuron. Similarly, despite the symmetry of the two neurons' preferred stimuli relative to the starting estimate $μ0$, the posterior mean shifts at the start of the trial toward the preferred stimulus of the second neuron due to its lower firing rate.

The continuous variance update, equation 3.9a, consists of the difference of two positive semidefinite terms and, accordingly, the posterior variance may increase or decrease between spikes along various directions. In Figure 3, the posterior variance decreases before the first spike, and increases between spikes afterward.

3.2  Continuous Population Approximation

3.2.1  Motivation

The filtering equations 3.9a to 3.9f implement sensory decoding for nonuniform populations. However, their applicability to studying neural encoding in large heterogeneous populations is limited for two closely related reasons. First, the computational cost of the filter is linear in the number of neurons. Second, the size of the parameter space describing the population is also linear in the number of neurons, making optimization of large populations computationally costly. These traits are shared with other filters designed for heterogeneous populations, namely Bobrowski et al. (2009), Eden and Brown (2008), and Twum-Danso and Brockett (2001).

To reduce the parameter space and simplify the filter, we approximate large neural populations by an infinite continuous population, characterized by a distribution over neuron parameters. These distributions are described by few parameters, resulting in a filter whose complexity does not grow with the population size and allowing optimization of population parameters rather than individual tuning functions.

For example, consider a homogeneous population of neurons with preferred stimuli equally spaced on an interval, as depicted in Figure 4a. If the population is large, its firing pattern statistics may be modeled as an infinite population of neurons, with preferred stimuli uniformly distributed on the same interval, as in Figure 4c. In this continuous population model, each spike is characterized by the preferred stimulus of the firing neuron, which is a continuous variable, rather than by the neuron's index. Such a population is parameterized by only two variables representing the end points of the interval, in addition to the tuning function height and width parameters. There is no need for a parameter representing the density of neurons on the interval, as scaling the density of neurons is equivalent to identical scaling of each neuron's maximum firing rate.

Figure 4:

Finite (discrete) and infinite (continuous) population models. (a) A finite population of six neurons with preferred stimuli $θi$ ($i=0,…,5$). The tuning functions are depicted on the left. The spiking pattern of each neuron in response to the stimulus $Xt$ (orange line) is represented by a separate point process $Nti$ (blue vertical ticks). (b) A representation of the same neural response as a single marked point process. Each blue dot represents a spike with the vertical location representing its mark $θ$. (c) A continuous population model approximating the statistics of the same population. The approximation would be better for larger populations. The population density $fdθ/dθ$ is depicted to the left of the plot, and the tuning function corresponding to one of the spikes on the right.

Figure 4:

Finite (discrete) and infinite (continuous) population models. (a) A finite population of six neurons with preferred stimuli $θi$ ($i=0,…,5$). The tuning functions are depicted on the left. The spiking pattern of each neuron in response to the stimulus $Xt$ (orange line) is represented by a separate point process $Nti$ (blue vertical ticks). (b) A representation of the same neural response as a single marked point process. Each blue dot represents a spike with the vertical location representing its mark $θ$. (c) A continuous population model approximating the statistics of the same population. The approximation would be better for larger populations. The population density $fdθ/dθ$ is depicted to the left of the plot, and the tuning function corresponding to one of the spikes on the right.

3.2.2  Marked Point Processes as Continuous Population Models

We now change the mathematical formulation and notation of our model to accommodate parameterized continuous populations. The new formulation is more general and includes finite populations as a special case. Rather than using a sequence of tuning function parameters—such as $(yi)i=1M=(hi,θi,Hi,Ri)i=1M$ in the case of gaussian neurons, equation 3.3—we characterize the population by a measure $fdy$, where $fY$ counts the neurons with parameters in a set $Y$, up to some multiplicative constant. A continuous measure $f$ may be used to approximate a large population. Accordingly, we write $λx;y$ for the tuning function of a neuron with parameters $y$, in lieu of the previous notation $λix$. For example, the gaussian case, equation 3.3, takes the form
$λx;y=λx;h,θ,H,R≜hexp-12Hx-θR2,$
(3.13)
with $y=h,θ,H,R$ the parameters of the gaussian tuning function.
Similarly, instead of the observation processes ${Ni}i=1M$ counting the spikes of each neuron, we describe the spikes of all neurons using a single marked point-process $N$ (Snyder & Miller, 1991), which is a random sequence of pairs $tk,yk$, where $tk∈[0,∞)$ is the time of the $k$th point and $yk∈Y$ its mark. In our case, $tk$ is the $k$th spike time, and $yk∈Y$ are the parameters of the spiking neuron. Alternatively, $N$ may be described as a random discrete measure, where $Ns,t×Y$ is the number of spikes in the time interval $s,t$ from neurons with parameters in the set $Y$. In line with the discrete measure view, we write, for an arbitrary function $h$,
$∫s,t×Yhτ,yNdτ,dy=∑k1s≤tk≤t,yk∈Yhtk,yk,$
which is the ordinary Lebesgue integral of $h$ with respect to the discrete measure $N$. We use the notation $NtY≜N[0,t]×Y$, and when $Y=Y$, we omit it and write $Nt$ for the total number of spikes up to time $t$. As before, $Nt$ denotes the history up to time $t$—here, including both spike times and marks.

Figure 4 illustrates how the activity of a neural population may be represented as a marked point process and how the firing statistics are approximated by a continuous population. In this case, a homogeneous population of neurons with equally spaced preferred stimuli is approximated by a continuous population with uniformly distributed preferred stimuli.

To characterize the statistics of $N$, we first consider the finite population case. In this case, $f=∑iδyi$ where $δyi$ is the point mass at $yi$, and the rate of points with marks in a set $Y⊆Y$ at time $t$ (conditioned on $Nt,X0,t$) is
$∑i:yi∈YλXt;yi=∫YλXt;yfdy.$
In the case of a general population distribution $f$, we similarly take the integral $∫YλXt;yfdy$ as the rate (or intensity) of points with marks in $Y$ at time $t$ conditioned on $(Nt,X0,t)$. The random measure $λXt;yfdy$ appearing in this integral is termed the intensity kernel of the marked point-process $N$ with respect to the history $(Nt,X0,t)$ (e.g., Brémaud, 1981). The dynamics of $N$ may be described heuristically by means of the intensity kernel as
$ENdt,dy|X0,t,Nt=λXt;yfdydt.$
(3.14)
The intensity kernel with respect to $Nt$ alone is given by
$ENdt,dy|Nt=λ^tyfdydt,$
(3.15)
where
$λ^ty≜EλXt;y|Nt=EtNλXt;y.$
We denote the rate of the unmarked process $Nt$ with respect to $Nt$ (i.e., the total posterior expected firing rate) by
$λ^tf≜∫λ^tyfdy.$

3.2.3  Filtering

Assume gaussian tuning functions, equation 3.13. Writing $y≜h,θ,H,R$, the filtering equations 3.9a to 3.9e take the form
$dμtc=∫YΣtH⊺StH,RδtH,θλ^tyfdydt$
(3.16a)
$dΣtc=∫YΣtH⊺StH,R-StH,RδtH,θ(δtH,θ)⊺StH,RHΣtλ^tyfdydt$
(3.16b)
$dμtN=∫YΣt-H⊺St-H,Rδt-H,θNdt,dy$
(3.16c)
$dΣtN=-∫YΣt-H⊺St-H,RHΣt-Ndt,dy$
(3.16d)
$δtH,θ≜Hμt-θ,$
(3.16e)
$StH,R≜R-1+H⊺ΣtH-1,$
(3.16f)
$λ^ty≜λ^th,θ,H,R=hStH,RRexp-12θ-HμtStH,R2,$
(3.16g)
and the posterior precision updates, equations 3.10a and 3.10b, become
$dΣt-1,c=-∫H⊺StH,R-StH,RδtH,θ(δtH,θ)⊺StH,RHλ^tyfdydt,$
(3.16h)
$dΣt-1,N=∫YH⊺RHNdt,dy.$
(3.16i)
The derivations of these equations, as well as filtering equations for special cases considered in this section, are in appendix A.

Using equation 3.16g, we may evaluate the continuous update equations 3.16a and 3.16b in closed form for some specific forms of the population distribution $f$. Note that the discontinuous update equations 3.16c and 3.16d do not depend on $f$ and are already in closed form. We now consider several population distributions where the continuous updates may be brought to closed form.

Single neuron. The result for a single neuron with parameters $h,θ,H,R$ is trivial to obtain from equations 3.16a and 3.16b, yielding
$dμtc=ΣtH⊺StH,RHμt-θλ^tfdt,$
(3.17a)
$dΣtc=Σt-H⊺StH,R-StH,RHμt-θHμt-θ⊺StH,Rλ^tfHΣt-dt,$
(3.17b)
where $λ^tf=λ^ty$ as given by 3.16g, and $StH,R$ is defined in equation 3.16f.
Uniform population. Here all neurons share the same height $h$ and shape matrices $H,R$, whereas the location parameter $θ$ covers $Rm$ uniformly, $fdh',dθ,dH',dR'=δhdh'δHdH'δRdR'dθ$, where $δx$ is a Dirac measure at $x$, $δxA=1{x∈A}$, and $dθ$ indicates the Lebesgue measure in the parameter $θ$. A straightforward calculation from equations 3.16a, 3.16b, and 3.16g yields
$dμtc=0,dΣtc=0,$
(3.18)
in agreement with the (exact) result obtained in the uniform coding setting of Rhodes and Snyder (1977), where the filtering equations include only the prior term and the discontinuous update term.
Gaussian population. As in the uniform population case, we assume all neurons share the same height $h$ and shape matrices $H,R$ and differ only in the location parameter $θ$. Abusing notation slightly, we write $f(dh',dθ,dH',dR')=δh(dh')δH(dH')δR(dR')f(dθ)$ where the preferred stimuli are normally distributed,
$fdθ=Nθ;c,Σpopdθ=2π-n/2Σpop-1/2exp-12θ-cΣpop-12dθ,$
(3.19)
for fixed $c∈Rm$, and positive-definite $Σpop$.

We take $f$ to be normalized, since any scaling of $f$ may be included in the coefficient $h$ in equation 3.13, resulting in the same point process. Thus, when used to approximate a large population, the coefficient $h$ would be proportional to the number of neurons.

$dμtc=ΣtH⊺ZtH,RHμt-cλ^tfdt,$
(3.20a)
$dΣtc=ΣtH⊺ZtH,R-ZtH,RHμt-cHμt-c⊺ZtH,RHΣtλ^tfdt,$
(3.20b)
where
$ZtH,R≜Σpop+R-1+HΣtH⊺-1,$
(3.20c)
$λ^tf≜∫λ^tyfdy=hZtH,RRexp-12Hμt-cZtH,R2.$
(3.20d)
These updates generalize the single-neuron updates, equation 3.17, with the population center $c$ taking the place of the location parameter $θ$ and $ZtH,R$ substituting $StH,R$. The single-neuron case is obtained when $Σpop=0$.
It is illustrative to consider these equations in the scalar case $m=n=1$, with $H=1$. Letting $σt2=Σt,α2=R-1,σpop2=Σpop$ yields
$dμtc=σt2σt2+α2+σpop2μt-cλ^tfdt,$
(3.21a)
$dσt2,c=σt2σt2+α2+σpop21-μt-c2σt2+α2+σpop2σt2λ^tfdt,$
(3.21b)
where
$λ^tf=h2πα2Nμt;c,σt2+α2+σpop2.$

Figure 5a demonstrates the continuous update terms, equation 3.21, as a function of the current mean estimate $μt$, for various values of the population variance $σpop2$, including the case of a single neuron, $σpop2=0$. The continuous update term $dμtc$ pushes the posterior mean $μt$ away from the population center $c$ in the absence of spikes. This effect weakens as $μt-c$ grows due to the factor $λ^tf$, consistent with the idea that far from $c$, the lack of events is less surprising, hence less informative. The continuous variance update term $dσt2,c$ increases the variance when $μt$ is near $θ$, otherwise decreases it. This stands in contrast with the Kalman-Bucy filter, where the posterior variance cannot increase when estimating a static state.

Figure 5:

Continuous update terms as a function of the current posterior mean estimate, for a 1D state observed through a population of gaussian neurons, equation 3.3. The population density is gaussian on the left plot and uniform on the interval $-1,1$ on the right plot. The bottom plots show the population density $fdθ/dθ$ and a tuning function $λx;θ$ for $θ=0$.

Figure 5:

Continuous update terms as a function of the current posterior mean estimate, for a 1D state observed through a population of gaussian neurons, equation 3.3. The population density is gaussian on the left plot and uniform on the interval $-1,1$ on the right plot. The bottom plots show the population density $fdθ/dθ$ and a tuning function $λx;θ$ for $θ=0$.

3.2.4  Uniform Population on an Interval

In this case, we assume a scalar state, $n=m=1$, and
$fdθ=1a,bθdθ,$
(3.22)
where similar to the gaussian population case, $h$ and $R$ are fixed. Unlike the gaussian case, here we find it more convenient not to normalize the distribution. Since the state is assumed to be scalar, let $σt2=Σt,α2=R-1$. The continuous updates for this case are
$dμtc=h2πα2σt2σt2+α2ϕbt'-ϕat'σtdt,$
(3.23a)
$dσt2,c=h2πα2σt2σt2+α2bt'ϕbt'-at'ϕat'σt2dt,$
(3.23b)
where
$at'≜a-μtσt2+α2,bt'≜b-μtσt2+α2,ϕx≜Nx;0,1,$
and the posterior rates are given by
$λ^tθ=h2πα2Nθ;μt,α2+σt2,λ^tf=h2πα2∫at'bt'ϕ.$

Figure 5b demonstrates the continuous update terms, equation 3.23, as a function of the current mean estimate $μt$. When the mean estimate is around an end point of the interval, the mean update $μtc$ pushes the posterior mean outside the interval in the absence of spikes. The posterior variance $σt2$ decreases outside the interval, where the absence of spikes is expected, and increases inside the interval, where it is unexpected.7 When the posterior mean is not near the interval end points, the updates are near zero, consistent with the uniform population case, equation 3.18.

Finite Mixtures. Note that the continuous updates, equations 3.16a and 3.16b, are linear in $f$. Accordingly, if $fdy=∑iαifidy,$ where each $fi$ is of one of the above forms, the updates are obtained by the appropriate weighted sums of the filters derived above for the various special forms of $fi$. This form is quite general: it includes populations where $θ$ is distributed according to a gaussian mixture, as well as heterogeneous populations with finitely many different values of the shape matrices $H,R$. The resulting filter includes a term for each component of the mixture.

4  Numerical evaluation

Since the filter, equation 3.16, is based on an assumed density approximation, its results may be inexact. We tested the accuracy of the filter in the gaussian population case (see equation 3.20), by numerical comparison with particle filtering (PF; Doucet & Johansen, 2009).

Figure 6 shows two examples of filtering a one-dimensional process observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3), using both the ADF approximation (see equation 3.20) and a PF for comparison. (See the figure caption for details.) Figure 7 shows the distribution of approximation errors and the deviation of the posterior from gaussian. The approximation errors plotted are the relative error in the mean estimate $∊μ≜μADF-μPF/σPF$ and the error in the posterior standard deviation estimate $∊σ≜σADF-σPF/σPF$, where $μADF,μPF,σADF,σPF$ are, respectively, the posterior mean obtained from ADF and PF and the posterior standard deviation obtained from ADF and PF. The deviation of the posterior distribution from gaussian is quantified using the Kolmogorov-Smirnov (KS) statistic $supxFx-Gx$ where $F$ is the particle distribution cdf and $G$ is the cdf of a gaussian matching $F$'s first two moments. For comparison, the orange lines in Figure 7 show the distribution of this KS statistic under the hypothesis that the particles are drawn independently from a gaussian, which is known as the Lilliefors distribution (see Lilliefors, 1967). As seen in the figure, the gaussian posterior assumption underlying the ADF approximation is quite accurate despite the fact that the population is nonuniform. Accordingly, approximation errors are typically of a few percent (see Table 1).

Figure 6:

Two examples of a linear one-dimensional process observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3), filtered using the ADF approximation (see equations 3.8, 3.16a, 3.16b, and 3.20), and using a particle filter. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus $θ$. The approximate posterior means obtained from ADF and particle filtering are shown in blue and red, respectively, with the corresponding posterior standard deviations in shaded areas of the respective colors. The curves to the right of the graph show the preferred stimulus density (black), and a neuron's tuning function, equation 3.3, centered at $θ=0$ (gray), arbitrarily normalized to the same height for visualization. The bottom graph shows the posterior variance. Parameters used in both examples: $a=-0.1,d=1,H=1,σpop2=4,R-1=0.25,c=0,μ0=0,σ02=1$ (note the different extent of the time axes). The observed processes were initialized from their steady-state distribution. The dynamics were discretized with time step $Δt=10-3$. The particle filter uses 1000 particles with systematic resampling (see, e.g., Doucet & Johansen, 2009) at each time step.

Figure 6:

Two examples of a linear one-dimensional process observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3), filtered using the ADF approximation (see equations 3.8, 3.16a, 3.16b, and 3.20), and using a particle filter. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus $θ$. The approximate posterior means obtained from ADF and particle filtering are shown in blue and red, respectively, with the corresponding posterior standard deviations in shaded areas of the respective colors. The curves to the right of the graph show the preferred stimulus density (black), and a neuron's tuning function, equation 3.3, centered at $θ=0$ (gray), arbitrarily normalized to the same height for visualization. The bottom graph shows the posterior variance. Parameters used in both examples: $a=-0.1,d=1,H=1,σpop2=4,R-1=0.25,c=0,μ0=0,σ02=1$ (note the different extent of the time axes). The observed processes were initialized from their steady-state distribution. The dynamics were discretized with time step $Δt=10-3$. The particle filter uses 1000 particles with systematic resampling (see, e.g., Doucet & Johansen, 2009) at each time step.

Figure 7:

Two-dimensional histograms of approximation errors (relative to particle filter) versus KS statistic of particle distribution, with color indicating the count of time steps falling within each two-dimensional bin. The KS statistic is plotted against the estimation errors $∊μ$, $∊σ$ (see main text). For clearer visualization, trials were omitted from the 2D histograms where the respective errors are below their 0.5 percentile value or over their 99.5 percentile value, as well as trials where the KS statistic is over its 99.5 percentile value. The 1D histograms above and to the right of each plot show the distribution of the KS statistic, of $∊μ$ and of $∊σ$. Orange lines represent the theoretical distribution of the KS statistic for gaussian data (Lilliefors distribution), obtained via Monte Carlo simulation. Data were collected from 100 trials of 1000 time steps with step size $Δt=10-3$, using 10,000 particles with systematic resampling at each time step.

Figure 7:

Two-dimensional histograms of approximation errors (relative to particle filter) versus KS statistic of particle distribution, with color indicating the count of time steps falling within each two-dimensional bin. The KS statistic is plotted against the estimation errors $∊μ$, $∊σ$ (see main text). For clearer visualization, trials were omitted from the 2D histograms where the respective errors are below their 0.5 percentile value or over their 99.5 percentile value, as well as trials where the KS statistic is over its 99.5 percentile value. The 1D histograms above and to the right of each plot show the distribution of the KS statistic, of $∊μ$ and of $∊σ$. Orange lines represent the theoretical distribution of the KS statistic for gaussian data (Lilliefors distribution), obtained via Monte Carlo simulation. Data were collected from 100 trials of 1000 time steps with step size $Δt=10-3$, using 10,000 particles with systematic resampling at each time step.

Figure 8 shows an example of filtering a two-dimensional process with dynamics
$dXt=010-0.1Xtdt+01dWt,$
(4.1)
which may be interpreted as the position and velocity of a particle subject to friction proportional to its velocity, as well as a Gaussian white noise external force. In this example, only the position is directly observed by the neural population. (Additional details are given in the figure caption.) The distribution of approximation errors and the KS statistic in this two-dimensional setting is shown in Figure 9. The approximation errors plotted are $∊μ$ and $∊σ$ as defined above; both these errors and the KS statistic are computed separately for each dimension.
Figure 8:

An example two-dimensional process (see equation 4.1) observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3; also see Figure 6). The sensory population is gaussian (see equations 3.3 and 3.19), with $H=1,0$, so only the particle position is directly observed. Other sensory parameters are $α2=R-1=0.25,h=10,Σpop=4$. The process was initialized from $N[0,0],I2$ where $I2$ is the $2×2$ identity matrix. Other details are as in Figure 6.

Figure 8:

An example two-dimensional process (see equation 4.1) observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3; also see Figure 6). The sensory population is gaussian (see equations 3.3 and 3.19), with $H=1,0$, so only the particle position is directly observed. Other sensory parameters are $α2=R-1=0.25,h=10,Σpop=4$. The process was initialized from $N[0,0],I2$ where $I2$ is the $2×2$ identity matrix. Other details are as in Figure 6.

Figure 9:

Two-dimensional histograms of approximation errors (relative to particle filter) versus KS statistic of particle distribution in a two-dimensional example (see equation 4.1). Details are as in Figure 7.

Figure 9:

Two-dimensional histograms of approximation errors (relative to particle filter) versus KS statistic of particle distribution in a two-dimensional example (see equation 4.1). Details are as in Figure 7.

Statistics of the estimation error distribution for these examples are provided in Table 1.

Table 1:
Approximation Errors Relative to PF in the Examples of Figures 6 and 8.
 (a) 1D Example (Figure 6) $h=1000$ $h=2$ $∊μ$ $∊σ$ $∊μ$ $∊σ$ Median −0.00272 $1.29×10-4$ $-2.84×10-4$ $2.96×10-4$ 5th percentile −0.0601 −0.0185 −0.0184 −0.0245 95th percentile 0.0482 0.0192 0.0186 0.0178 Mean −0.00415 $1.41×10-4$ $3.34×10-4$ $-9.35×10-4$ SD 0.0345 0.0126 0.0119 0.0122 Median absolute value 0.0188 0.00722 0.00662 0.00766 Mean absolute value 0.0251 0.00919 0.0086 0.00942 (b) 2D Example (Figure 8) Position Velocity $∊μ$ $∊σ$ $∊μ$ $∊σ$ Median $-1.32×10-4$ $2.28×10-4$ $-3.37×10-4$ $-1.53×10-4$ 5th percentile −0.0337 −0.0253 −0.0234 −0.0148 95th percentile 0.0361 0.0257 0.0258 0.0154 Mean $-$0.00101 $2.95×10-5$ $1.51×10-5$ $2.95×10-5$ SD 0.0236 0.0157 0.0169 0.00922 Median absolute value 0.0115 0.00920 0.00908 0.00564 Mean absolute value 0.0163 0.0118 0.0121 0.00711
 (a) 1D Example (Figure 6) $h=1000$ $h=2$ $∊μ$ $∊σ$ $∊μ$ $∊σ$ Median −0.00272 $1.29×10-4$ $-2.84×10-4$ $2.96×10-4$ 5th percentile −0.0601 −0.0185 −0.0184 −0.0245 95th percentile 0.0482 0.0192 0.0186 0.0178 Mean −0.00415 $1.41×10-4$ $3.34×10-4$ $-9.35×10-4$ SD 0.0345 0.0126 0.0119 0.0122 Median absolute value 0.0188 0.00722 0.00662 0.00766 Mean absolute value 0.0251 0.00919 0.0086 0.00942 (b) 2D Example (Figure 8) Position Velocity $∊μ$ $∊σ$ $∊μ$ $∊σ$ Median $-1.32×10-4$ $2.28×10-4$ $-3.37×10-4$ $-1.53×10-4$ 5th percentile −0.0337 −0.0253 −0.0234 −0.0148 95th percentile 0.0361 0.0257 0.0258 0.0154 Mean $-$0.00101 $2.95×10-5$ $1.51×10-5$ $2.95×10-5$ SD 0.0236 0.0157 0.0169 0.00922 Median absolute value 0.0115 0.00920 0.00908 0.00564 Mean absolute value 0.0163 0.0118 0.0121 0.00711

5  Encoding

We demonstrate the use of the assumed density filter in determining optimal encoding strategies: selecting the optimal population parameters $ϕ$ (see section 2). To illustrate the use of ADF for the encoding problem, we consider two simple examples. We also use the first example as a test for the filter's robustness. We will study optimal encoding issues in more detail in a subsequent paper.

5.1  Optimal Encoding Depends on Prior Variance

Previous work using a finite neuron population and a Fisher information-based criterion (Harper & McAlpine, 2004) has suggested that the optimal distribution of preferred stimuli depends on the prior variance. When it is small relative to the tuning width, optimal encoding is achieved by placing all preferred stimuli at a fixed distance from the prior mean. When the prior variance is large relative to the tuning width, optimal encoding is uniform (see Figure 2 in Harper & McAlpine, 2004). These results are consistent with biological observations reported in Brand, Behrend, Marquardt, McAlpine, & Grothe (2002) concerning the encoding of aural stimuli.

Similar results are obtained with our model, as shown in Figure 10. Whereas Harper and McAlpine (2004) implicitly assumed a static state in the computation of Fisher information, we use a time-varying scalar state. The state obeys the dynamics
$dXt=aXtdt+sdWta<0$
(5.1)
and is observed through a gaussian population (see equation 3.19) and filtered using the ADF approximation. In this case, optimal encoding is interpreted as the simultaneous optimization of the population center $c$ and the population variance $σpop2$. The process is initialized so that it has a constant prior distribution, its variance given by $s2/2a$. In Figure 10 (left), the steady-state prior distribution is narrow relative to the tuning width, leading to an optimal population with a narrow population distribution far from the origin. In Figure 10 (right), the prior is wide relative to the tuning width, leading to an optimal population with variance that roughly matches the prior variance.
Figure 10:

Optimal population distribution depends on prior variance relative to tuning function width. A scalar state with dynamics $dXt=aXt+sdWt$ ($a=-0.05$) is filtered with tuning parameters $h=50,α=1$ and preferred stimulus density $N(c,σpop2)$. The process is initialized from its steady-state distribution, $N0,s2/-2a$. Both graphs show the posterior standard deviation derived from the ADF approximation relative to the prior standard deviation $σp$. In the left graph, $s=0.1$, so the prior variance is 0.1, whereas on the right, $s=0.5$, so the prior variance is 2.5. In both cases, the filter is initialized with the correct prior, and the posterior variance is averaged over the time interval $1,2$ and across 1000 trials for each data point. Only nonnegative values of $c$ were simulated, but note that the process is symmetric about zero, so that the full plots would also be symmetric about $c=0$. The areas in white in the right plot correspond to parameters where the computed posterior variance exceeded the prior variance. This is due to the poor performance of the ADF approximation for these parameter values in cases where no spikes occur and the true posterior becomes bimodal.

Figure 10:

Optimal population distribution depends on prior variance relative to tuning function width. A scalar state with dynamics $dXt=aXt+sdWt$ ($a=-0.05$) is filtered with tuning parameters $h=50,α=1$ and preferred stimulus density $N(c,σpop2)$. The process is initialized from its steady-state distribution, $N0,s2/-2a$. Both graphs show the posterior standard deviation derived from the ADF approximation relative to the prior standard deviation $σp$. In the left graph, $s=0.1$, so the prior variance is 0.1, whereas on the right, $s=0.5$, so the prior variance is 2.5. In both cases, the filter is initialized with the correct prior, and the posterior variance is averaged over the time interval $1,2$ and across 1000 trials for each data point. Only nonnegative values of $c$ were simulated, but note that the process is symmetric about zero, so that the full plots would also be symmetric about $c=0$. The areas in white in the right plot correspond to parameters where the computed posterior variance exceeded the prior variance. This is due to the poor performance of the ADF approximation for these parameter values in cases where no spikes occur and the true posterior becomes bimodal.

Note that we are optimizing only the two parameters $c,σpop2$ rather than each preferred stimulus as in Harper and McAlpine (2004). This mitigates the considerable additional computational cost due to simulating the decoding process rather than computing Fisher information. This direct simulation, though more computationally expensive, offers two advantages over the Fisher information-based method used in Harper and McAlpine (2004) and prevalent in computational neuroscience. First, the simple computation of Fisher information from tuning functions, commonly used in the neuroscience literature, is based on the assumption of a static state, whereas our method can be applied in a fully dynamic context, including the presence of observation-dependent feedback. Second, simulations of the decoding process allow for the minimization of arbitrary criteria, including the direct minimization of posterior variance or mean square error (MSE). As discussed in section 1, Fisher information may be a poor proxy for the MSE for finite decoding times.

We also test the robustness of the filter to inaccuracies in model parameters or observation/encoding parameters in this problem. We use inaccurate values for the model parameter $a$ in equation 5.1 and the observation/encoding parameter $h$ in equation 3.13 in the filter. Specifically, we multiply or divide each of the two parameters by a factor of $5/4$ in the filter, while the dynamics and the observing neural population remain unaltered. The results remain qualitatively similar, as seen in Figure 11.

Figure 11:

Robustness of the encoding result of Figure 10 to decoding with an inaccurate model of the dynamics or of the observation and encoding process. As in Figure 10, color indicates the posterior standard deviation derived from the ADF approximation relative to the prior standard deviation. The values of the dynamics parameter $a$ and the population parameter $h$ used by the filter, respectively denoted $a^,h^$, are varied while the dynamics and observation processes remain the same. Each row corresponds to a value of $a^$ and each column to a value of $h^$. White crosses denote the values where the posterior standard deviation is minimized.

Figure 11:

Robustness of the encoding result of Figure 10 to decoding with an inaccurate model of the dynamics or of the observation and encoding process. As in Figure 10, color indicates the posterior standard deviation derived from the ADF approximation relative to the prior standard deviation. The values of the dynamics parameter $a$ and the population parameter $h$ used by the filter, respectively denoted $a^,h^$, are varied while the dynamics and observation processes remain the same. Each row corresponds to a value of $a^$ and each column to a value of $h^$. White crosses denote the values where the posterior standard deviation is minimized.

5.2  Adaptation of Homogeneous Population to Stimulus Statistics

Benucci, Saleem, and Carandini (2013) measured the tuning of cat primary visual cortex (V1) neurons to oriented gratings after adapting to either a uniform distribution of orientations or a distribution with one orientation being more common. The population's preferred stimuli are roughly uniformly distributed and adapt to the prior statistics through change in amplitude. The adapted population was observed to have a decreased firing rate near the common orientation so that the mean firing rate is constant across the population.

We present a simplified model where optimal encoding exhibits a constant mean firing rate across a neural population. A random state $X$ is drawn from a prior distribution that is a mixture of two uniform distributions on intervals sharing an end point (see Figure 12a),
$px=∑i=12ki1ai≤x≤bibi-ai,k1+k2=1,b1=a2.$
(5.2)
The neural population consists of gaussian neurons (see equation 3.13) with preferred locations uniformly distributed on the interval $a1,b2$. However, they are allowed to adapt to different tuning amplitude $h1,h2$ in each of the subintervals $a1,b1,a2,b2$, respectively. Thus, the population distribution is given by a mixture of two uniform components,
$fdy=fdh,dθ,dH',dR'=δHdH'δRdR'∑iδhidhfidθ,fidθ≜1ai≤θ≤bidθ,$
(5.3)
where $H=1$. We optimize the parameters $h1,h2$ to minimize accumulated decoding MSE over a finite decoding interval under a constraint on the total firing rate of the population,
$minh1,h2E∫0TX-X^t2dt,s.t.r≜E∫λX;yfdy≤r‾,$
(5.4)
where $X^t=EX|Nt$ is the MMSE estimate of $X$. The total firing rate $r$ may be obtained from equations 3.13, 5.2, and 5.3, yielding
$r=r1+r2,ri=∑j=12kj∫ajbjdxbj-aj∫aibidθhiexp-x-θ22α2=2πα2∑j=12kjhibj-aj×Φ1aj-biα+Φ1bj-aiα-Φ1aj-aiα-Φ1bj-biα,$
(5.5)
where $ri$ is the total firing rate in the $i$th subpopulation, $α2=R$, $Φ1x≜∫-∞xΦ=xΦx+ϕx,$ and $ϕ,Φ$ are, respectively, the pdf and cdf of the standard gaussian distribution. In this continuous population approximation, the firing rate of a single neuron with preferred stimulus $θ$ is proportional to $EλX;θ$. We use narrow tuning ($α=0.1$), so that the rate $EλX;θ$ is nearly constant within each subpopulation, justifying the approximation
$EλX;θ≈1bi-ai∫aibiEλX;θdθ=ribi-aiai≤θ≤bi.$
(5.6)
To solve this optimization problem, we approximate $X^T$ by the ADF filter output $μT$. We further assume that the MSE is a decreasing function of the total firing rate $r$, so that the solution is obtained for $r=r‾$. Although we have no proof of this claim, it appears reasonable, since the posterior variance decreases at each spike (see section 3.1.5). The stronger constraint $r=r‾$ leaves a single degree of freedom, the amplitude ratio $h1/h2$. In Figure 12b we evaluate $E[∫(X-μt)2dt]$ using Monte Carlo simulation for various values of the ratio $h1/h2$, as well as location of the interval end point $b2$. Although the optimization is constrained only by the total firing rate, the minimal MSE is obtained near the solution of
$r1b1-a1=r2b2-a2,$
where firing rates are equalized across the population.
Figure 12:

Optimal population in the mixture model equalizes firing rates of different neurons. (a) Prior and neural population. (b) Estimation MSE as a function of height ratio $h1/h2$ and interval end point $b2$. Parameters are $k1=0.1,k2=0.9$, $a1=-1,b1=a2=0,T=1,r‾=10,α2=0.01,H=1$. The dotted white line indicates the ratio where firing rates are equalized between the two subpopulations according to the approximation equation 5.6, and the orange line the ratio where the MSE is minimized. The MSE was computed by averaging across 10,000 trials for each data point.

Figure 12:

Optimal population in the mixture model equalizes firing rates of different neurons. (a) Prior and neural population. (b) Estimation MSE as a function of height ratio $h1/h2$ and interval end point $b2$. Parameters are $k1=0.1,k2=0.9$, $a1=-1,b1=a2=0,T=1,r‾=10,α2=0.01,H=1$. The dotted white line indicates the ratio where firing rates are equalized between the two subpopulations according to the approximation equation 5.6, and the orange line the ratio where the MSE is minimized. The MSE was computed by averaging across 10,000 trials for each data point.

6  Comparison to Previous Work

6.1  Neural Decoding

Table 2 provides a concise comparison of our setting and results to previous work on optimal neural decoding. As noted in section 1, we focus our comparison on analytically expressible continuous-time filters.

Table 2:
Summary of the Setting and Results of Several Previous Works Based on Continuous-Time Filtering of Point-Process Observations, in Comparison to Our Work.
 Neural Code Decoding Reference Dynamics Rates Population Exact Complexity Snyder (1972); Segall, Davis, and Kailath (1975) Diffusion Any Finite$a$ $√$ $∞$ Snyder (1972)$b$ Diffusion Any Finite – $n$ Rhodes and Snyder (1977) lin. G. diff. Gaussian Uniform $√$ 1 Komaee (2010) lin. G. diff. Any Uniform – 1 Huys et al. (2007) 1d G. process Gaussian Uniform $√$ $Nt3$$c$ Eden and Brown (2008) lin. G. diff. Any Finite – $n$ Susemihl et al. (2011, 2013) 1d G. Mat. Gaussian Uniform – 1 Bobrowski et al. (2009); Twum-Danso and Brockett (2001) CTMC Any Finite $√$ $n$ This work Diffusion Gaussian Nonuniform – 1$d$
 Neural Code Decoding Reference Dynamics Rates Population Exact Complexity Snyder (1972); Segall, Davis, and Kailath (1975) Diffusion Any Finite$a$ $√$ $∞$ Snyder (1972)$b$ Diffusion Any Finite – $n$ Rhodes and Snyder (1977) lin. G. diff. Gaussian Uniform $√$ 1 Komaee (2010) lin. G. diff. Any Uniform – 1 Huys et al. (2007) 1d G. process Gaussian Uniform $√$ $Nt3$$c$ Eden and Brown (2008) lin. G. diff. Any Finite – $n$ Susemihl et al. (2011, 2013) 1d G. Mat. Gaussian Uniform – 1 Bobrowski et al. (2009); Twum-Danso and Brockett (2001) CTMC Any Finite $√$ $n$ This work Diffusion Gaussian Nonuniform – 1$d$

Notes: The complexity column lists the asymptotic computational complexity of each time step in a discrete-time implementation of the filter, as a function of population size $n$ and number of spikes $Nt$. A complexity of $∞$ denotes an infinite-dimensional filter. Lin. G. diff.: linear gaussian diffusion; 1d G. Mat.: one-dimensional gaussian process with Matern kernel autocorrelation; CTMC: continuous-time Markov chain.

$a$The setting of Snyder (1972) and Segall et al. (1975) is a single observation point process, but the result is readily extended to a finite population by summation.

$b$ Snyder (1972) includes both an exact PDE for the posterior statistics and an approximate solution.

$c$This filter is nonrecursive, and its complexity grows with the history of spikes. The exponent 3 is related to inversion of an $Nt×Nt$ matrix, which in principle can be performed with lower complexity.

$d$The complexity is constant for distributions over preferred stimuli as described in section 3.2.3 (including the uniform coding case). For heterogenous mixture populations, the complexity is linear in the number of mixture components.

Few previous analytically oriented works studied neural decoding for dynamic stimuli or without the uniform coding assumption. The filtering problem for a dynamic state with a general (nonuniform) finite population is tractable when the state space is finite; in this case, the posterior is finite-dimensional and obeys SDEs derived in Brémaud (1981). These results have been presented in a neuroscience context in Twum-Danso and Brockett (2001) and Bobrowski et al. (2009).

We are aware of a single previous work (Eden & Brown, 2008) that derived a closed-form filter for nonuniform coding of a diffusion process in continuous time. The setting of Eden and Brown (2008) is a finite population with arbitrary tuning functions, and the derivation uses an approximation similar to the one used in our letter. Our work differs from Eden and Brown (2008) most notably by the use of a parameterized “infinite” population. In this sense, Eden and Brown (2008) setting is less general, corresponding to the case of finite population described in section 3.1. On the other hand, Eden and Brown (2008) deal with a more general form for the tuning functions. Although they also approximate the posterior using a gaussian distribution, it is not equivalent to our filter. The two filters are compared in detail in appendix D.

6.2  Neural Encoding

As the work we report in this letter is concerned with efficient neural decoding as a tool for studying neural encoding in nonuniform populations, we briefly mention some work that studies neural encoding analytically. As we have noted, we will study encoding in more detail in a subsequent paper. Much previous work used Fisher information to study nonuniform coding of static stimuli. As we noted in section 1, Fisher information does not necessarily provide a good estimate for possible decoding performance, and optimizing it may yield qualitatively misleading results (see Bethge et al., 2002; Pilarski & Pokora, 2015; Yaeli & Meir, 2010). However, we note one such work, Ganguli and Simoncelli (2014), that is particularly relevant in comparison to our work, as it similarly studies large parameterized populations. The neural populations Ganguli and Simoncelli (2014) studied are characterized by two functions of the stimulus: a density function, which described the local density of neurons, and a “gain” function, which modulates each neuron's gain according to its preferred location. The density function also distorts tuning functions so that tuning width is approximately inversely proportional to neuron density, resulting in approximately uniform coding when the gain function is constant. However, the introduction of the gain function produces a violation of uniform coding. This population is optimized for Fisher information and several related measures. For each of these measures, the optimal population density is shown to be monotonic with the prior: more neurons should be assigned to more probable states. This is in contrast to the results we present in section 5 (e.g., see Figure 10, left), where the optimal population distribution is shifted relative to the prior distribution when the prior is narrow. This discrepancy may be attributed to the limitations of Fisher information in predicting actual decoding error or to the coupling of neuron density and tuning width used in Ganguli and Simoncelli (2014) to facilitate the derivation of closed-form solutions.

Some previous work attempted direct minimization of decoding MSE rather than Fisher information. Yaeli and Meir (2010) derived an explicit expression for the MSE in the static case with uniform coding and used it to characterize optimal tuning function width and its relation to coding time. More recently, Wang, Stocker, and Lee (2016) studied $Lp$-based loss measures and presented exact results for optimal tuning functions in the case of univariate static signals, for single neurons and homogeneous populations. Susemihl et al. (2011, 2013) suggested a mean-field approximation to allow efficient evaluation of the MSE in a dynamic setting.

Notes

1

Although we describe this work using neuroscience terminology, the motivation and formulation in Rhodes and Snyder (1977) is not related to neuroscience.

2
For an introduction to SDEs and the Wiener process see, for example, Øksendal (2003). Intuitively, equation 3.1 may be interpreted as a differential equation with continuous-time gaussian white noise $ξt$,
$X˙t=AXt+DXtξt,$
or as the limit as $Δt→0$ of the discretized dynamics,
$Xk+1Δt=XkΔt+AXkΔtΔt+DXkΔtξkΔt,$
where $ξk$ are independent standard gaussian variables.
3

In the sense that any two solutions $Xt1,Xt2$ defined over $0,T$ with $X01=X02$ satisfy $P[Xt1=Xt2∀t∈0,T]=1$. See, e.g., Øksendal (2003, theroem 5.2.1) for sufficient conditions.

4

If equation 3.1 includes an additional feedback (control) term, $Ni$ are not DSPPs. Our results apply with little modification to this case, as described in appendix A.

5

See Segall and Kailath (1975, theorem 2).

6

The setting of Snyder (1972) includes a single observation point process. The extension to several point processes is obtained through summation as in equation 3.5 and is a special case of a more general PDE described in Rhodes and Snyder (1977) and discussed in appendix A.

7

This holds only approximately, when the tuning width is not too large relative to the size of the interval. For wider tuning functions, the behavior becomes similar to the single sensor case.

8

A more detailed discussion of this definition and of innovation processes is available in Segall and Kailath (1975).

9

That is, it is strictly a function of the history $Ft$.

10

The setting considered in Rhodes and Snyder (1977) assumes linear dynamics and an infinite uniform population. However, these assumption are only relevant to establish other proposition in that paper. The proof of equation 3.5 still holds as is in our more general setting.

Appendix A:  Derivation of Filtering Equations

A.1  Setting and Notation

In the main text, we have presented our model in an open-loop setting, where the process $X$ is passively observed. Here we consider a more general setting, incorporating a control process $Ut$, so the dynamics are
$dXt=AXt+BUtdt+DXtdWt,$
(A.1)
where, in general, $Ut$ is a function of $Nt$.

We denote by $ptN·$ the posterior density—that is, the density of $Xt$ given $Nt$, and by $EtN·$ the posterior expectation—that is, expectation conditioned on $Nt$.

A.2  The Innovation Measure

We derive the filtering equations in terms of the innovation measure of the marked point process, which is a random measure closely related to the notion of the innovation process associated with unmarked point processes or diffusion processes. The role of the innovation measure in filtering marked point processes is analogous to the role of the innovation process in Kalman filtering.

In section 3.1.1, we characterized the intensity (or rate) of each point process in a finite population using the asymptotic behavior of jump probabilities in small intervals. An alternative definition is the following.8 Consider an unmarked point-process $Nt$ with $Nt$ denoting its history up to time $t$. Given some history $Ft$ containing $Nt$ (e.g., $Ft$ might include the observation process $N$ and its driving state process $X$), the process $λtF$ is called the intensity of $N$ relative to $F$ when $λt$ is $Ft$-measurable9 and
$ItF≜Nt-∫0tλsFds$
is an $Ft$-martingale, meaning
$EItF|Fs=IsF,$
or, equivalently,
$ENt|Fs=Ns+E∫stλF|Fs.$
Heuristically we may write this relation as
$EdNt|Ft=λtFdt.$
When $Ft=Nt$, the process $It≜ItN$ is called the innovation process. We may write
$dNt=λtNdt+dIt=EdNt|Nt+dIt,$
so the innovation increments $dIt$ represent the unexpected part of the increments $dNt$ after observation of $Nt$. The innovation process may be similarly defined for other stochastic processes (such as discrete time or diffusion processes), and it plays an important role in the Kalman and the Kalman-Bucy filters. It plays an analogous role in the filtering of point processes, as seen below.
As discussed in section 3.2.2, in the continuous population model, we characterize the observation process $N$ by its intensity kernel relative to the history $Nt,X0,t$:
$ENdt,dy|X0,t,Nt=λXt;yfdydt.$
This equation is a heuristic notation for the statement that the rate of $NtY$ relative to $Nt,X0,t$ is $∫YλXt;yfdy$, for any measurable $Y⊂Y$. The rate relative to the spiking history $Nt$ is obtained by marginalizing over $Xt$ (see Segall & Kailath, 1975, theorem 2), yielding the rate $∫Yλ^yfdy$, where
$λ^y=EtNλXt;y.$
Therefore, the innovation process of $NtY$ is $ItY=NtY-∫0t∫Yλ^syfdyds$, and accordingly we define the innovation measure $I$ to be the random measure
$Idt,dy≜Ndt,dy-λ^tyfdydt,$
(A.2)
so that the innovation process may be expressed as
$ItY=∫0t∫YIds,dy.$
The martingale property of $ItY$ implies that the innovation measure satisfies
$EtN∫YIdt,dy=0$
for all measurable $Y⊂Y$.

A.3  Exact Filtering Equations

Define
$ωtx;y≜λx;yλ^ty-1,ωty≜ωtXt;y.$
The stochastic PDE, equation 3.5, is extended in Rhodes and Snyder (1977) for the case of marked point-process observation in the presence of feedback.10 In this case, the posterior density $ptN$ obeys the equation
$dptNx=Lt*ptNxdt+pt-Nx∫y∈Yωtx;yIdt,dy.$
(A.3)
Here $Lt$ is the posterior infinitesimal generator, defined with an additional conditioning on $Nt$,
$Lthx=limΔt→0+EhXt+Δt|Xt=x,Nt-hxΔt,$
and $Lt*$ is its adjoint. Note that in this closed-loop setting, the infinitesimal generator is itself a random operator due to its dependence on past observations through the control law and that $Nt$ is no longer a doubly stochastic Poisson process.
As in section 3.1.3, we use the notations
$μt≜EtNXt,X~t≜Xt-μt,Σt≜EtNX~tX~t⊺.$
We derive the following equations for the first two posterior moments, which generalize equation 3.6 to marked point processes, and for the presence of feedback in the state dynamics:
$dμt=EtNAXt+BUtdt+∫YEt-Nωt-yXt-Idt,dy,$
(A.4a)
$dΣt=EtNAXtX~tT+X~tAXtT+DXtDXtTdt+∫YEt-Nωt-yX~t-X~t-TIdt,dy-∫YEt-Nωt-yXt-Et-Nωt-yXt-TNdt,dy.$
(A.4b)
A rigorous derivation of equation A.4a under more general conditions is found in Segall et al. (1975), from which equation A.4b may be derived by considering the dynamics of the process $XtXt⊺$. Here we provide a more heuristic derivation based on equation A.3.
Compare the mean update equation, A.4a, to the Kalman-Bucy filter, which gives the posterior moments for a diffusion process $X$ with noisy observations $Y$ of the form
$dXt=AtdXt+DtdWt,dYt=HtXtdt+dVt,$
where $W,V$ are independent standard Wiener processes. The Kalman-Bucy filter reads
$dμt=Atμtdt+ΣtHt⊺dYt-Htμtdt,Σ˙t=AtΣt+ΣtAt+DtDt⊺-ΣtHt⊺HtΣt.$
Here, the term $dYt-Htμtdt$ appearing in the first equation is an increment of the innovation process $Yt-∫0tHsμsds$.
For a sufficiently well-behaved function $h$, we find, using equation A.3 and the definition of operator adjoint,
$dEtNhXt=∫dxhxLt*ptNxdt+pt-Nx∫y∈Yωt-x;yIdt,dy=∫dxptNxLthxdt+pt-Nxhx∫y∈Yωt-x;yIdt,dy=EtNLthXtdt+∫y∈YEt-NhXt-ωt-yIdt,dy.$
(A.5)
Assuming the state evolves as in equation A.1, the (closed-loop) infinitesimal generator is
$Lthx=Ax+BUt⊺∇hx+12Tr∇2hxDxDx⊺,$
which, when specialized to the functions $hix=xi$ and $hijx=xixj$, where $xi$ is the $i$th component of $x$, reads
$Lthix=Ax+BUti,$
(A.6)
$Lhijx=Ax+BUtixj+xiAx+BUtj+12DxDx⊺ij+12DxDx⊺ji=Ax+BUtixj+xiAx+BUtj+DxDx⊺ij.$
(A.7)
Substituting equation A.6 into A.5 yields
$dμti=EtNAx+BUtidt+∫y∈YEt-NXt-iωt-yIdt,dy,$
or, in vector notation,
$dμt=EtNAXtdt+BUtdt+∫y∈YEt-Nωt-yXt-Idt,dy.$
(A.8)
To compute $dΣt$ we use the representation $dΣt=dEtNXtXt⊺-dμtμt⊺$. The first term is computed by substituting equation A.7 into A.5, yielding
$dEtNXtiXtj=EtNAXt+BUtiXtj+XtiAXt+BUtj+EtNDXtDXt⊺ij+∫y∈YEt-NXt-iXt-jωt-yIdt,dy,$
or, in matrix notation, after some rearranging,
$dEtNXtXt⊺=EtNAXtXt⊺+XtAXt⊺dt+BUtμt⊺+μtBUt⊺dt+EtNDXtDXt⊺dt+∫y∈YEt-Nωt-yXt-Xt-⊺Idt,dy.$
(A.9)
To calculate $dμtμt⊺$ from equation A.8, we separately handle the continuous terms and the jump term involving $Ndt,dy$. The continuous terms are the continuous part of $dμtμt⊺+μtdμt⊺$. To compute the jump terms, we note that when $μt$ jumps by $Δt$, the corresponding jump in $μtμt⊺$ is $Δtμt⊺+μtΔt⊺+ΔtΔt⊺$; therefore,
$dμtμt⊺=EtNAXtμt⊺+μtEtNAXt⊺dt+BUtμt⊺+μtBUt⊺dt-∫y∈YEtNωtyXtμt⊺+μtEtNωtyXt⊺λ^tyfdydt$
(A.10)
$+∫y∈Y(Et-Nωt-yXt-μt⊺+μtEtNωt-yXt⊺+Et-Nωt-yXt-Et-Nωt-yXt-⊺)Ndt,dy.$
(A.11)
Subtracting equation A.11 from A.9, and noting that $EtNωty=0$, so
$Et-Nωt-yX~t-X~t-⊺=Et-Nωt-yXt-Xt-⊺-Et-Nωt-yXt-μt-⊺-μt-Et-Nωt-yXt-⊺$
yields equation A.4b.
Writing equations A.4a and A.4b according to the decomposition described in section 3.1.3,
$dμtπ=EtNAXt+BUtdt,dΣtπ=EtNAXtX~t⊺+X~tAXt⊺+DXtDXt⊺dt,dμtc=-∫y∈YEtNωtyXtλ^yfdydt,$
(A.12)
$dΣtc=-∫y∈YEtNωtyX~tX~t⊺λ^yfdydt,$
(A.13)
$dμtN=∫y∈YEt-Nωt-yXt-Ndt,dy,$
(A.14)
$dΣtN=∫y∈YEt-Nωt-yX~t-X~t-⊺-Et-Nωt-yXt-Et-Nωt-yXt-⊺Ndt,dy.$
(A.15)

A.4  ADF Approximation for Gaussian Tuning

We now proceed to apply the gaussian ADF approximation $ptNx≈Nx;μt,Σt$ to equations A.12 to A.15 in the case of gaussian neurons (see equation 3.3), deriving approximate filtering equations written in terms of the population density $fdy$. From here on, we use $μt,Σt$, and $ptN$ to refer to the ADF approximation rather than to the exact values.

We use the following algebraic results. The first is a slightly generalized form of a well-known result about the sum of quadratic forms, which is useful for multiplying gaussians with possibly degenerate precision matrices:

Claim 1. Let $x,a,b∈Rn$ and $A,B∈Rn×n$ be symmetric matrices such that $A+B$ is nonsingular. Then
$x-aA2+x-bB2=a-bAA+B-1B2+x-A+B-1Aa+BbA+B2.$
Proof.

The proof is accomplished by straightforward expansion of each side. $□$

Next, we note two matrix inversion lemmas, the first of which is known as the Woodbury identity. These are useful for transferring variance matrices between state and perceptual coordinates in our model. Derivations may be found in Henderson and Searle (1981).

Claim 2. For $U,V⊺∈Rm×n$ and nonsingular $A∈Rm×m,C∈Rn×n$, the following equalities hold,
$A+UCV-1=A-1-A-1UC-1+VA-1U-1VA-1,A-1UC-1+VA-1U-1=A+UCV-1UC,$
(A.16)
whenever all the relevant inverses exist.
To evaluate the posterior of expectations in equations A.12 to A.15, we first simplify the expression
$ptNxωtx;y=ptNxλx;y∫ptNξλξ;ydξ-ptNx.$
(A.17)
Using the gaussian ADF approximation $ptNx=Nx;μt,Σt$, equation 3.13, and claim 1, we find
$ptNxλx;h,θ,H,R==hNx;μt,Σtexp-12Hx-θR2=hexp-12x-μtΣt-12-12Hx-θR22πnΣt=h2πnΣtexp-12Hr-1θ-μtQt2-12x-μtθΣt-1+HTRH2,$
where $Hr-1$ is any right inverse of $H$, and
$Qt≜Σt-1Σt-1+H⊺RH-1H⊺RH,$
(A.18)
$μtθ≜Σt-1+H⊺RH-1Σt-1μt+H⊺Rθ.$
(A.19)
To simplify the notation, we suppress the dependence of these and other quantities on $H$ and $R$ throughout this section. Claim 2 establishes the relation $Qt=H⊺StH$, where
$St≜R-1+HΣtH⊺-1,$
yielding
$ptNxλx;h,θ,H,R=h2π-n/2Σt-1×exp-12θ-HμtSt2-12x-μtθΣt-1+H⊺RH2,$
(A.20)
and by normalizing this gaussian (see equation A.17), we find that
$ptNxωtx;h,θ,R=Nx;μtθ,Σt-1+H⊺RH-1-ptNx,$
yielding
$EtNωtyXt=μtθ-μt,EtNωtyX~tX~tT=Σt-1+H⊺RH-1+μt-μtθμt-μtθ⊺-Σt.$
(A.21)
Using claim 2, the difference $μtθ-μt$ may be rewritten as
$μtθ-μt=Σt-1+H⊺RH-1Σt-1μt+H⊺Rθ-μt=-Σt-1+H⊺RH-1H⊺RHμt+Σt-1+H⊺RH-1H⊺Rθ=-Σt-1+H⊺RH-1H⊺Rδt=-ΣtH⊺Stδt,$
where $δt=Hμt-θ$, and an application of the Woodbury identity, equation A.16, yields
$Σt-1+H⊺RH-1-Σt=ΣtI+H⊺RHΣt-1-I=ΣtI-H⊺R-1+HΣtH⊺-1HΣt-I=-ΣtH⊺StHΣt.$
Now,
$EtNωtyXt=-ΣtHTStδt,EtNωtyX~tX~tT=ΣtHTStδtδt⊺St-StHΣt.$
Plugging this result into equations A.12 to A.15 yields equations 3.16a to 3.16d. Integrating equation A.20 over $x$ yields
$λ^h,θ,H,R=hI+ΣtH⊺RHexp-12θ-HμtSt2.$
Sylvester's determinant identity yields the equality $I+ΣtHTRH=R/StR$, from which equation 3.16g follows.
The continuous precision update, equation 3.16h, follows directly from equation 3.16b and the relation
$dΣt-1dt=-Σt-1dΣtdtΣt-1,$
which holds whenever $Σt$ is differentiable—that is, between spikes. To derive equation 3.16i, consider a spike at time $t$ with mark $y=h,θ,H,R$. According to equation 3.16d,
$Σt-1=Σt--Σt-H⊺St-HΣt--1=I-H⊺St-HΣt--1Σt--1=I+H⊺St--1-HΣt-H⊺-1HΣt-Σt--1(Woodburyidentity)=Σt--1+H⊺RHfromequation3.16f$
Finally, equations A.14 and A.21 yield $dμtN=∫μt-θ-μt-Ndt,dy$, so at spike times,
$μt+=μt-θ=Σt--1+H⊺RH-1Σt--1μt-+H⊺Rθ.$
The finite-population case is equation 3.12.

A.5  Approximation of Continuous Terms for Specific Population Distributions

A.5.1  Gaussian Population

When the preferred stimuli are normally distributed, equation 3.19, the continuous update terms, equations 3.16a and 3.16b may be computed analogously to the derivation in section A.4. First, starting from equation 3.16g, an analogous computation to the derivation of equations A.20 and 3.16g above yields
$λ^th,θ,H,Rfdθ=λ^tf·Nθ;μtf,Σpop-1+St-1dθ,$
where
$μtf≜Σpop-1+St-1Stμt+Σpop-1c.$
Integrating this equation over $θ$ and applying Sylvester's determinant lemma as in the derivation of equation 3.16g yields equation 3.20d. The matrix $ZtH,R$ (see equation 3.20c) and vector $μtf$ play an analogous role to that of $St$ and $μtθ$, respectively, in section A.4. Substituting into equations 3.16a and 3.16b and simplifying analogously yields equation 3.20.

A.5.2  Uniform Population on an Interval

In this case $R,h$ are constant, $m=n=1$ and $H=1$, and the preferred stimulus distribution is $fdθ=1a,bθdθ$, so equations 3.16a and 3.16b take the form
$dμtc=σt2σt2+α2∫abμt-θλ^tθdθdt,$
(A.22)
$dσt2,c=λ^tf-∫abθ-μt2λ^tθdθσt2+α2σt2σt2+α2σt2dt,λ^tθ=h2πα2Nθ;μt,σt2+α2,$
(A.23)
where $σt2=Σt,σr2=R-1$, and we suppressed the dependence of $λ^$ on $h,R$ from the notation, since $h,R$ are fixed. The integrals can be computed from the following identities,
$∫abNx;μ,σ2x-μdx=-zσ,∫abNx;μ,σ2x-μ2dx=Z-z'σ2,$
where
$a'≜a-μσ,Z≜∫a'b'ϕ=Φb'-Φa'b'≜b-μσ,z≜ϕb'-ϕa',ϕx≜Nx;0,1,z'≜b'ϕb'-a'ϕa'.$
Writing
$at'≜a-μtσt2+α2,Zt≜Φbt'-Φat'bt'≜b-μtσt2+α2,zt≜ϕbt'-ϕat',zt'≜bt'ϕbt'-at'ϕat'.$
we find that
$∫abμt-θλ^tθdθ=h2πα2∫abμt-θNθ;μt,σt2+α2dθ=h2πα2ztσt2+α2,∫abθ-μt2λ^tθdθ=h2πα2∫abθ-μt2Nθ;μt,σt2+α2dθ=h2πα2Zt-zt'σt2+α2,λ^tf=∫abλ^tθdθ=h2πα2Zt.$

Substitution into equations A.22 and A.23 yields equation 3.23.

Appendix B:  Implementation Details

B.1  State Dynamics

All simulations in this letter use linear dynamics of the form
$dXt=AXtdt+DdWt,$
(B.1)
which are implemented via a straightforward Euler scheme. Specifically, for step size $Δt$, we approximate $XkΔt$ by $xk$, where
$x0=X0,xk+1=xk+AxkΔt+DξkΔt,$
and $ξk$ is a sequence of independent standard normal variables (independent of $X0$).

B.2  Continuous Neural Population

The simulation of marked point processes, used to model continuous neural populations (see section 3.2.2), involves the generation of random times and corresponding random marks. In the case of a finite population, there is a finite number of marks, and the point process corresponding to each mark may be simulated separately. For an infinite population, a different approach is required.

Given the intensity kernel $λXt;yfdy,$ we simulate a marked point process in two stages: first generating the random times (spike times) and then the random marks (neuron parameters). To generate the random times $t1,…tNT$, note that the total history-conditioned firing rate at time $t$ is given by
$rXt≜∫YλXt;yfdy,$
and the unmarked process $Nt$ is a doubly stochastic Poisson process with random rate $rXt$. Conditioned on $X$ and the point-process history, each random mark $yi$ is distributed:
$yi|X,t1,…ti,y1…yi-1∼κXti,dy≜λXti;yfdyrXti$
(see Brémaud, 1981, theorem T6).

Accordingly, the simulation of the marked process $N$ proceeds as follows:

1. Using the generated trajectory $xk≈XkΔt$, simulate a Poisson process with rate $rXt$, yielding the random times $(t1,…tNT)$. This may be accomplished either via direct generation of a point for each time step with probability $rxkΔt$ or more efficiently via time rescaling (see, e.g., Brown, Barbieri, Ventura, Kass, & Frank, 2002).

2. Generate random marks $(y1,…yNT)$ by sampling independently from the distribution $κXti,dy$.

As in section 3.2.3, when $h,H,R$ are fixed across the population, we abuse notation and write $fdh',dθ,dH',dR'=δhdh'δHdH'δRdR'fdθ$, and similarly for $κx;dθ$. The functions $rx$ and the distribution $κ(x,dθ)$ for each of the distributions of preferred stimuli in section 3.2.3 are given in closed form in Table 3. For a finite heterogeneous mixture population, $r$ and $κ$ may be obtained through the appropriate weighted summation; however, it is easier to simulate each component separately.

Table 3:
Total Population Rates $rx$ and Mark Sampling Distributions $κx;dθ$ for the Preferred Stimulus Distributions $fdθ$ of Section 3.2.3 with Gaussian Tuning $λx;θ=hexp-12Hx-θR2$.
 $fdθ$ $rx$ $κx;dθ$ $δθ0dθ$ $λx;θ0$ $δθ0dθ$ $dθ$ $h2πmdetR$ $N(θ;Hx,R-1)dθ$ $N(θ;c,G)dθ$ $h2πmdetRNc;Hx,R-1+G$ $N(θ;GRGHx+R-1RGc,(R+G-1)-1)dθ,$ where $RG=R-1+G-1$ $1a≤θ≤bdθ$ $h2πRΦ(zbx)-Φ(zax)$⁠, $Na,bθ;Hx,R-1dθ$ where $zsx=Rs-Hx$ (truncated normal distribution)
 $fdθ$ $rx$ $κx;dθ$ $δθ0dθ$ $λx;θ0$ $δθ0dθ$ $dθ$ $h2πmdetR$ $N(θ;Hx,R-1)dθ$ $N(θ;c,G)dθ$ $h2πmdetRNc;Hx,R-1+G$ $N(θ;GRGHx+R-1RGc,(R+G-1)-1)dθ,$ where $RG=R-1+G-1$ $1a≤θ≤bdθ$ $h2πRΦ(zbx)-Φ(zax)$⁠, $Na,bθ;Hx,R-1dθ$ where $zsx=Rs-Hx$ (truncated normal distribution)

Note: The derivations of these closed forms are straightforward for the Dirac and uniform distributions; the derivation for gaussian distribution is by multiplication of gaussians, completely paralleling the computations in section A.4.

B.3  Filter

Similar to the state dynamics, we approximate the filter equations
$dμt=dμtπ+dμtc+dμtN,dΣt=dΣtπ+dΣtc+dΣtN,$
using a Euler approximation,
$μk+1Δt=μkΔt+dμtπdt+dμtcdtΔt+dμtNt=kΔt,Σk+1Δt=ΣkΔt+dΣtπdt+dΣtcdtΔt+dΣtNt=kΔt.$

For the linear dynamics (see equation B.1) used in simulations throughout this letter, the prior terms $dμtπ,dΣtπ$ are given by equation 3.8. The continuous update terms $dμtc,dΣtc$ depend on the population as described in section 3.2.3. The discontinuous updates $dμtN,dΣtN$ are given by equations 3.16c and 3.16d, and are nonzero only at time steps containing a spike.

Appendix C:  Variance as Proxy for MSE

In section 5, we studied optimal encoding using the posterior variance as a proxy for the MSE. Letting $μt,Σt$ denote the approximate posterior moments given by the filter, the MSE and posterior variance are related as follows,
$MSEt≜EtrXt-μtXt-μtT=EEtNtrXt-μtXt-μtT=EtrVartNXt+Etrμt-EtNXtμt-EtNXtT,$
where $EtN·,VartN·$ are, respectively, the mean and covariance matrix conditioned on $Nt$, and $tr$ is the trace operator. Thus, for an exact filter, having $μt=EtNXt,Σt=VartNXt$, we would have $MSEt=E[trΣt]$. Conversely, if we find that $MSEt≈E[trΣt]$, it suggests that the errors are small (though this is not guaranteed, since the errors in $μt$ and $Σt$ may affect the MSE in opposite directions, if the variance is underestimated).

Figure 13 shows the variance and MSE in estimating the same process as in Figure 10 after averaging across 10,000 trials. The results indicate that the filter has good accuracy with these parameters, so that the variance is a reasonable approximation for the MSE.

Figure 13:

Posterior variance versus MSE when filtering the one-dimensional process $dXt=aXtdt+sdWt$ of Figure 10. As in Figure 10, parameters on the left are $a=-0.05,s=0.1$ and on the right $a=-0.05,s=0.5$. The sensory population is gaussian (see equations 3.19 and 3.13) with population parameters $c=0,σpop2=4$ with and tuning parameters $α=1,h=50$. The top plots show the normalized measured bias $μt-Xt/σt$, where $·$ denotes averaging across trials. The center plots show the MSE $〈μt-Xt2〉$ and mean posterior variance $〈σt2〉$. The bottom plot shows the ratio of means $〈μt-Xt2〉/〈σt2〉$ and the mean ratio $〈μt-Xt2/σt2〉$. The means were taken across 10,000 trials. Shaded areas indicate 95% confidence intervals obtained via bootstrapping.

Figure 13:

Posterior variance versus MSE when filtering the one-dimensional process $dXt=aXtdt+sdWt$ of Figure 10. As in Figure 10, parameters on the left are $a=-0.05,s=0.1$ and on the right $a=-0.05,s=0.5$. The sensory population is gaussian (see equations 3.19 and 3.13) with population parameters $c=0,σpop2=4$ with and tuning parameters $α=1,h=50$. The top plots show the normalized measured bias $μt-Xt/σt$, where $·$ denotes averaging across trials. The center plots show the MSE $〈μt-Xt2〉$ and mean posterior variance $〈σt2〉$. The bottom plot shows the ratio of means $〈μt-Xt2〉/〈σt2〉$ and the mean ratio $〈μt-Xt2/σt2〉$. The means were taken across 10,000 trials. Shaded areas indicate 95% confidence intervals obtained via bootstrapping.

Figure 14 is a variant of Figure 10 showing the MSE rather than the variance. The results are noisier but qualitatively similar. The largest differences are observed in Figure 13b for small population variance, where the ADF estimation is poor due to very few spikes occurring.

Figure 14:

Optimal population distribution depends on prior variance relative to tuning curve width. This figure is based on the same data as Figure 10 in the main text, with MSE plotted instead of estimated posterior variance. See Figure 10 for more details.

Figure 14:

Optimal population distribution depends on prior variance relative to tuning curve width. This figure is based on the same data as Figure 10 in the main text, with MSE plotted instead of estimated posterior variance. See Figure 10 for more details.

Appendix D:  Comparison to Previous Work: Additional Details

As noted in section 6.1, we are aware of a single previous work (Eden & Brown, 2008) deriving a closed-form filter for nonuniform coding of a diffusion process in continuous time. We now present a detailed comparison of our work to Eden and Brown (2008).

The filter derived in Eden and Brown (2008) is a continuous-time version of a discrete-time filter presented in Eden, Frank, Barbieri, Solo, and Brown (2004). The setting of Eden et al. (2004) involves linear discrete-time state dynamics and a finite population of neurons with arbitrary tuning functions, firing independently (given the state). The derivation of the filter relies on an approximation applied to the measurement update for the posterior density,
$logpXk|Nk=logpXk|Nk-1+logPΔNk|Xk+const,$
where $Xk$ is the external state at time $tk$, $Nk$ is the spiking history up to time $tk$, and $ΔNk$ the spike counts in the interval $(tk-1,tk]$. A gaussian density is substituted for each of the terms $pXk+1,Nk+1,pXk+1,Nk$, and each side is expanded to a second-order Taylor series about the point $EXk|Nk-1$. This yields equations relating the first two moments of $pXk|Nk$ to those of $pXk|Nk-1$. The time update equations for the first two moments (relating $pXk+1|Nk$ to $pXk|Nk$) require no approximation since the dynamics are linear. The resulting discrete-time filtering equations (equations 2.7 to 2.10 in Eden et al., 2004) depend on the gradient and Hessian of the logarithm of the tuning functions at the point $EXk|Nk-1$. The continuous-time version (equations 6.3 and 6.4 in Eden & Brown, 2008) is derived by taking the limit as the time discretization step $Δt$ approaches 0.

In contrast, the starting point in our derivation is the exact update equations for the first two moments expressed in terms of posterior expectations. A gaussian posterior is substituted into these expectations, resulting in tractable integrals in the case of gaussian tuning functions. The tractability of these integrals depends on the gaussian form of the tuning functions.

To compare the resulting filters, we consider a finite population of gaussian neurons: this is the intersection of the setting of the current work with that of Eden and Brown (2008). In this case, the filtering equations in Eden and Brown (2008) yield the same discontinuous update terms, but the continuous update terms take the form
$dμtc,EB=∑iλiμtΣtHi⊺Riδtidt,$
(D.1a)
$dΣtc,EB=∑iλiμtΣtHi⊺Ri-Riδtiδti⊺RiHiΣtdt,$
(D.1b)
where
$δti≜Hiμt-θi$
(see section D.1). These equations differ from equations 3.9a and 3.9b in the use of the tuning function shape matrix $Ri$ in place of $Sti$ (defined in equation 3.9f), and $λiμt$ in place of $λ^ti$. The difference between $λiμt$ and $λ^ti$ similarly involves substituting $Ri$ for $Sti$,
$λiμt=hexp-12δti⊺Riδti,λ^ti=hStiRiexp-12δti⊺Stiδti.$
Since $Sti=Ri-1+Hi⊺ΣtHi-1$, our filtering equations take into account the posterior variance $Σt$ in several places where it is absent in equations D.1. Note that when $Σt=0$, we have $Ri=Sti$ and $λμt=λ^t$, so the equations become increasingly similar when $Σt→0$. We refer to the filter, equations D.1, as the Eden-Brown (EB) filter.

We compared the performance of our filter and equations D.1 in simulations of a simple one-dimensional setup. Figure 15 shows an example of filtering a static one-dimensional state observed through two gaussian neurons (see equation 3.3) using both the ADF approximation, equations 3.9a and 3.9b, and the EB filter, equation D.1, for comparison. Since the mean square error is highly sensitive to outliers where the approximation fails and the filtering error becomes large, we compare the filters by observing the distribution of absolute errors (AE) in estimating the state. Figure 16 compares the AE in estimating the state for the two filters. As expected from our analysis, the ADF filter has an advantage, particularly in earlier times when the posterior variance is large. This is seen most clearly in the 95th percentile of AEs in Figure 16a and in the tail histograms in Figure 16c, but a small advantage may also be observed for the median in Figure 16d. However, in some trials, the error in the EB filter remains large throughout.

Figure 15:

An example of a static 1D state observed through two sensory neurons and filtered by ADF, equations 3.9a and 3.9b, and by the EB filter, equations D.1a and D.1b. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus $θ$. The approximate posterior means obtained from ADF and the EB filter are shown in orange and blue, respectively, with the corresponding posterior standard deviations in shaded areas of the respective colors. The curves to the right of the graph show the tuning functions of the two neurons. Parameters are $h=10,H=1,R-1=0.5,μ0=0,σ02=1$, and the neurons are centered at $-1$ and 1. The dynamics were discretized with time step $Δt=10-3$.

Figure 15:

An example of a static 1D state observed through two sensory neurons and filtered by ADF, equations 3.9a and 3.9b, and by the EB filter, equations D.1a and D.1b. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus $θ$. The approximate posterior means obtained from ADF and the EB filter are shown in orange and blue, respectively, with the corresponding posterior standard deviations in shaded areas of the respective colors. The curves to the right of the graph show the tuning functions of the two neurons. Parameters are $h=10,H=1,R-1=0.5,μ0=0,σ02=1$, and the neurons are centered at $-1$ and 1. The dynamics were discretized with time step $Δt=10-3$.

Figure 16:

Distribution of absolute estimation errors as a function of time, collected from 10,000 trials. Parameters are the same as in Figure 15, with the state sampled from $N0,1$ in each trial. Preferred stimuli are noted above each subfigure. (a) Medians of the distribution of AEs for ADF (orange) and the EB filter (blue), along with 5th and 95th percentiles, as a function of time. The medians are indistinguishable at this scale. (b) Histograms of AEs for some specific times, with logarithmically spaced bins. The vertical dotted line indicates the AE at filter initialization, which equals the prior expectation $EX0-μ0=σ0Φ-134$ where $Φ-1$ is the quantile function for the standard normal distribution. (c) Histogram of AEs larger than the initial AE. (d) Normalized differences of AE percentiles $pADFr-pEBr/12pADFr+pEBr$, where $pADFr,pEBr$ are the $r$th quantile of the AE distribution for the ADF and EB filters, respectively, for $r=0.5,0.05,0.95$. Negative values indicate an advantage of ADF over EB. Shaded areas indicate 95% confidence intervals derived via bootstrapping (e.g., Efron & Tibshirani, 1994).

Figure 16:

Distribution of absolute estimation errors as a function of time, collected from 10,000 trials. Parameters are the same as in Figure 15, with the state sampled from $N0,1$ in each trial. Preferred stimuli are noted above each subfigure. (a) Medians of the distribution of AEs for ADF (orange) and the EB filter (blue), along with 5th and 95th percentiles, as a function of time. The medians are indistinguishable at this scale. (b) Histograms of AEs for some specific times, with logarithmically spaced bins. The vertical dotted line indicates the AE at filter initialization, which equals the prior expectation $EX0-μ0=σ0Φ-134$ where $Φ-1$ is the quantile function for the standard normal distribution. (c) Histogram of AEs larger than the initial AE. (d) Normalized differences of AE percentiles $pADFr-pEBr/12pADFr+pEBr$, where $pADFr,pEBr$ are the $r$th quantile of the AE distribution for the ADF and EB filters, respectively, for $r=0.5,0.05,0.95$. Negative values indicate an advantage of ADF over EB. Shaded areas indicate 95% confidence intervals derived via bootstrapping (e.g., Efron & Tibshirani, 1994).

In Figure 16 (left) we chose the preferred stimuli $-0.51$ and 0.5. These are not symmetric around 0 due to a limitation of the EB filter. When applied to a homogeneous population and the current posterior mean is precisely in the average of the population's preferred stimuli, the posterior mean remains constant until the next spike, while the posterior variance evolves as
$dσt2,c,EB=∑iλiμt1α21-μt-θi2α2σt4dt=const·σt4dt,$
which diverges in finite time if the constant coefficient is positive. The coefficient is positive when $θi$ are close to $μt$. This causes divergence of the filter when the preferred stimuli are symmetric around the initial estimate $μ0$ and sufficiently near it and the first spike is sufficiently delayed. To avoid this behavior we chose preferred stimuli that are not symmetric around $μ0=0$. This asymmetry causes an eventual shift in the posterior mean in the absence of spikes, which suppresses the growth of the posterior variance. This effect may cause high estimation errors before the first spike, as evident in Figure 16d (left).

D.1  Derivation of Equations D.1

In our notation, the filtering equation of Eden and Brown (2008) read
$dμtc,EB=-∑iΣt∇logλiμtλiμtdt,$
(D.2a)
$dΣtc,EB=-∑iΣt∇2λiμtΣtdt,$
(D.2b)
$dμtN,EB=∑iΣt+∇logλiμt-dNti,$
(D.2c)
$dΣtN,EB=-∑iΣt-St-EBΣt-dNti,$
(D.2d)
where $∇,∇2$ denote the gradient and Hessian, respectively, and $St-EB$ is defined as
$StEB≜∇2logλiμtΣt∇2logλiμt-I-1$

Note that this is a corrected version of the definition used in Eden and Brown (2008), which is unusable when the Hessian is singular (this occurs in our model whenever $m). The definition of $StEB$ given here extends it to the singular case.

For a gaussian firing rate, equation 3.3, the relevant gradient and Hessians are given by
$∇logλix=-Hi⊺RiHix-θi,∇2logλix=-Hi⊺RiHi,∇2λix=-λixHi⊺Ri-RiHix-θiHix-θi⊺RiHi,$
so
$StEB=∇2logλiμtΣt∇2logλiμt-I-1=-Hi⊺RiHi-ΣtHi⊺RiHi-I-1=Hi⊺RiHiΣt-1+Hi⊺RiHi-1Σt-1=Hi⊺Ri-1+HiΣtHi⊺-1Hi(usingclaim2)=Hi⊺StiHi,$
where $Sti$ is given by equation 3.9f). Substituting into equations D.2 yields the continuous updates, equations D.1a and D.1b, and the discontinuous updates
$dμtN,EB=∑iΣt+Hi⊺Riδt-idNti,dΣtN,EB=-∑iΣt-Hi⊺St-iHiΣt-dNti.$
The discontinuous variance update is identical to the ADF update, equation 3.9c. The discontinuous mean update can be rewritten in terms of $Σt-$ by noting that
$Σt+Hi⊺Ri=Σt--Σt-Hi⊺St-iHiΣt-Hi⊺Ri=Σt-Hi⊺I-St-iHiΣt-Hi⊺Ri=Σt-Hi⊺I-Ri-1+HtΣtHi⊺-1HiΣt-Hi⊺Ri=Σt-Hi⊺I+RiHiΣt-Hi⊺-1Ri(Woodburyidentity)=Σt-Hi⊺Sti,$
so the mean update may be rewritten as
$dμtN,EB=∑iΣt-Hi⊺Stiδt-idNti,$
in agreement with equation 3.9c.

Acknowledgments

The work of Y.H. and R.M. is partially supported by grant 451/17 from the Israel Science Foundation and by the Ollendorff Center of the Viterbi Faculty of Electrical Engineering at the Technion.

References

Benucci
,
A.
,
Saleem
,
A. B.
, &
Carandini
,
M.
(
2013
).
Adaptation maintains population homeostasis in primary visual cortex
.
Nature Neuroscience
,
16
(
6
),
724
729
. doi:10.1038/nn.3382
Bethge
,
M.
,
Rotermund
,
D.
, &
Pawelzik
,
K.
(
2002
).
Optimal short-term population coding: When Fisher information fails
.
Neural Comput.
,
14
(
10
),
2317
2351
. doi:10.1162/08997660260293247
Bobrowski
,
O.
,
Meir
,
R.
, &
Eldar
,
Y.
(
2009
).
Bayesian filtering in spiking neural networks: Noise, adaptation, and multisensory integration
.
Neural Comput.
,
21
(
5
),
1277
1320
. doi:10.1162/neco.2008.01-08-692
Brand
,
A.
,
Behrend
,
O.
,
Marquardt
,
T.
,
McAlpine
,
D.
, &
Grothe
,
B.
(
2002
).
Precise inhibition is essential for microsecond interaural time difference coding
.
Nature
,
417
(
6888
),
543
547
. doi:10.1038/417543a
Brémaud
,
P.
(
1981
).
Point processes and queues: Martingale dynamics
.
New York
:
Springer
.
Brigo
,
D.
,
Hanzon
,
B.
, &
Le Gland
,
F.
(
1999
).
Approximate nonlinear filtering by projection on exponential manifolds of densities
.
Bernoulli
,
5
(
3
),
495
534
.
Brown
,
E. N.
,
Barbieri
,
R.
,
Ventura
,
V.
,
Kass
,
R. E.
, &
Frank
,
L. M.
(
2002
).
The time-rescaling theorem and its application to neural spike train data analysis
.
Neural Computation
,
14
(
2
),
325
346
.
Chelaru
,
M.
, &
Dragoi
,
V.
(
2008
).
Efficient coding in heterogeneous neuronal populations
.
Proceedings of the National Academy of Sciences
,
105
(
42
),
16344
16349
.
Dayan
,
P.
, &
Abbott
,
L.
(
2005
).
Theoretical neuroscience: Computational and mathematical modeling of neural systems
.
Cambridge, MA
:
MIT Press
.
Doucet
,
A.
, &
Johansen
,
A.
(
2009
). A tutorial on particle filtering and smoothing: Fifteen years later. In
D.
Crisan
&
B.
Rozovskii
(Eds.),
Handbook of nonlinear filtering
(pp.
656
704
).
Oxford
:
Oxford University Press
.
Ecker
,
A.
,
Berens
,
P.
,
Tolias
,
A.
, &
Bethge
,
M.
(
2011
).
The effect of noise correlations in populations of diversely tuned neurons
.
Journal of Neuroscience
,
31
(
40
),
14272
14283
.
Eden
,
U. T.
, &
Brown
,
E. N.
(
2008
).
Continuous-time filters for state estimation from point process models of neural data
.
Statistica Sinica
,
18
(
4
),
1293
1310
Eden
,
U. T.
,
Frank
,
L. M.
,
Barbieri
,
R.
,
Solo
,
V.
, &
Brown
,
E. N.
(
2004
).
Dynamic analysis of neural encoding by point process adaptive filtering
.
Neural Computation
,
16
,
971
998
.
Efron
,
B.
, &
Tibshirani
,
R. J.
(
1994
).
An introduction to the bootstrap
.
Boca Raton, FL
:
CRC Press
.
Frey
,
R.
, &
Runggaldier
,
W. J.
(
2001
).
A nonlinear filtering approach to volatility estimation with a view towards high frequency data
.
International Journal of Theoretical and Applied Finance
,
4
(
2
),
199
210
.
Ganguli
,
D.
, &
Simoncelli
,
E.
(
2014
).
Efficient sensory encoding and Bayesian inference with heterogeneous neural populations
.
Neural Comput
,
26
(
10
),
2103
2134
. doi:10.1162/NECO_a_00638
Gill
,
R. D.
, &
Levit
,
B. Y.
(
1995
).
Applications of the Van Trees inequality: A Bayesian Cramér-Rao bound
.
Bernoulli
,
1
(
1–2
),
59
79
.
Harel
,
Y.
,
Meir
,
R.
, &
Opper
,
M.
(
2015
). A tractable approximation to optimal point process filtering: Application to neural encoding. In
C.
Cortes
,
N. D.
Lawrence
,
D. D.
Lee
,
M.
Sugiyama
, &
R.
Garnett
(Eds.),
Advances in neural information processing systems
,
28
(pp.
1603
1611
).
Red Hook, NY
:
Curran
.
Harper
,
N.
, &
McAlpine
,
D.
(
2004
).
Optimal neural population coding of an auditory spatial cue
.
Nature
,
430
(
7000
),
682
686
. (
1397
) doi:10.1038/nature02768
Henderson
,
H.
, &
Searle
,
S.
(
1981
).
On deriving the inverse of a sum of matrices
.
SIAM Review
,
23
(
1
),
53
60
.
Huys
,
Q. J. M.
,
Zemel
,
R. S.
,
Natarajan
,
R.
, &
Dayan
,
P.
(
2007
).
Fast population coding
.
Neural Computation
,
19
(
2
),
404
441
. doi:10.1162/neco.2007.19.2.404
Komaee
,
A.
(
2010
).
State estimation from space-time point process observations with an application in optical beam tracking
. In
Proceedings of the IEEE 44th Annual Conference on Information Sciences and Systems
.
Piscataway, NJ
:
IEEE
. doi:10.1109/CISS.2010.5464949
Lilliefors
,
H. W.
(
1967
).
On the Kolmogorov-Smirnov test for normality with mean and variance unknown
.
Journal of the American Statistical Association
,
62
(
318
),
399
402
.
Macke
,
J. H.
,
Buesing
,
L.
, &
Sahani
,
M.
(
2015
). Estimating state and parameters in state space models of spike trains. In
Z.
Chen
(Ed.),
Advanced State Space Methods for Neural and Clinical Data
.
Cambridge
:
Cambridge University Press
.
Maybeck
,
P.
(
1979
).
Stochastic models, estimation, and control
.
Orlando, FL
:
.
Minka
,
T.
(
2001
).
Expectation propagation for approximate Bayesian inference
. In
Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence
(pp.
362
369
).
San Mateo, CA
:
Morgan Kaufmann
.
Øksendal
,
B.
(
2003
).
Stochastic differential equations
.
Berlin
:
Springer
.
Opper
,
M.
(
1998
). A Bayesian approach to online learning. In
D.
(Ed.),
Online learning in neural networks
(pp.
363
378
).
Cambridge
:
Cambridge University Press
.
Pilarski
,
S.
, &
Pokora
,
O.
(
2015
).
On the Cramér-Rao bound applicability and the role of Fisher information in computational neuroscience
.
BioSystems
,
136
,
11
22
. doi:10.1016/j.biosystems.2015.07.009
Pillow
,
J. W.
,
Shlens
,
J.
,
Paninski
,
L.
,
Sher
,
A.
,
Litke
,
A. M.
,
Chichilnisky
,
E.
, &
Simoncelli
,
E. P.
(
2008
).
Spatio-temporal correlations and visual signalling in a complete neuronal population
.
Nature
,
454
(
7207
),
995
999
.
,
C.
(
1945
).
Information and the accuracy attainable in the estimation of statistical parameters
.
Bull. Calcutta Math. Soc.
,
37
,
81
91
.
Rhodes
,
I.
, &
Snyder
,
D.
(
1977
).
Estimation and control performance for space-time point-process observations
.
IEEE Transactions on Automatic Control
,
22
(
3
),
338
346
. doi:10.1109/TAC.1977.1101534
Segall
,
A.
(
1976
).
Recursive estimation from discrete-time point processes
.
IEEE Transactions on Information Theory
,
22
(
4
),
422
431
.
Segall
,
A.
(
1978
).
Centralized and decentralized control schemes for Gauss-Poisson processes
.
IEEE Transactions on Automatic Control
,
23
(
1
),
47
57
.
Segall
,
A.
,
Davis
,
M. H.
, &
Kailath
,
T.
(
1975
).
Nonlinear filtering with counting observations
.
IEEE Transactions on Information Theory
,
21
(
2
),
143
149
.
Segall
,
A.
, &
Kailath
,
T.
(
1975
).
The modeling of randomly modulated jump processes
.
IEEE Transactions on Information Theory
,
21
(
2
),
135
143
.
Snyder
,
D.
(
1972
).
Filtering and detection for doubly stochastic Poisson processes
.
IEEE Transactions on Information Theory
,
18
(
1
),
91
102
. doi:10.1109/TIT.1972.1054756
Snyder
,
D.
, &
Miller
,
M.
(
1991
).
Random point processes in time and space
(2nd ed.).
Berlin
:
Springer
.
Snyder
,
D.
,
Rhodes
,
I.
, &
Hoversten
,
E.
(
1977
).
A separation theorem for stochastic control problems with point-process observations
.
Automatica
,
13
(
1
),
85
87
.
Solo
,
V.
(
2000
).
“Unobserved” Monte Carlo method for identification of partially observed nonlinear state space systems, Part II: Counting process observations
. In
Proceedings of the 39th IEEE Conference on Decision and Control
(pp.
3331
3336
).
Piscataway, NJ
:
IEEE
.
Susemihl
,
A.
(
2014
).
Optimal population coding of dynamic stimuli
.
Ph.D. diss., Technical University Berlin
.
Susemihl
,
A.
,
Meir
,
R.
, &
Opper
,
M.
(
2011
). Analytical results for the error in filtering of gaussian processes. In
J.
Shawe-Taylor
,
R.
Zemel
,
P.
Bartlett
,
F.
Pereira
, &
K.
Weinberger
(Eds.),
Advances in neural information processing systems
,
24
(pp.
2303
2311
).
Red Hook, NY
:
Curran
.
Susemihl
,
A.
,
Meir
,
R.
, &
Opper
,
M.
(
2013
).
Dynamic state estimation based on Poisson spike trains—towards a theory of optimal encoding
.
Journal of Statistical Mechanics: Theory and Experiment
,
2013
(
3
),
P03009
.
Susemihl
,
A.
,
Meir
,
R.
, &
Opper
,
M.
(
2014
). Optimal neural codes for control and estimation. In
Z.
Ghahramani
,
M.
Welling
,
C.
Cortes
,
N. D.
Lawrence
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems
,
27
(pp.
2987
2995
).
Red Hook, NY
:
Curran
.
Twum-Danso
,
N.
, &
Brockett
,
R.
(
2001
).
Trajectory estimation from place cell data
.
Neural Networks
,
14
(
6–7
),
835
844
. doi:10.1016/S0893-6080(01)00079-X
Wang
,
Z.
,
Stocker
,
A. A.
, &
Lee
,
D. D.
(
2016
).
Efficient neural codes that minimize LP reconstruction error
.
Neural Computation
,
28
,
2656
2686
.
Yaeli
,
S.
, &
Meir
,
R.
(
2010
).
Error-based analysis of optimal tuning functions explains phenomena observed in sensory neurons
.
Front. Comput. Neurosci.
,
4
,
130
. doi:10.3389/fncom.2010.00130