Abstract

Neural decoding may be formulated as dynamic state estimation (filtering) based on point-process observations, a generally intractable problem. Numerical sampling techniques are often practically useful for the decoding of real neural data. However, they are less useful as theoretical tools for modeling and understanding sensory neural systems, since they lead to limited conceptual insight into optimal encoding and decoding strategies. We consider sensory neural populations characterized by a distribution over neuron parameters. We develop an analytically tractable Bayesian approximation to optimal filtering based on the observation of spiking activity that greatly facilitates the analysis of optimal encoding in situations deviating from common assumptions of uniform coding. Continuous distributions are used to approximate large populations with few parameters, resulting in a filter whose complexity does not grow with population size and allowing optimization of population parameters rather than individual tuning functions. Numerical comparison with particle filtering demonstrates the quality of the approximation. The analytic framework leads to insights that are difficult to obtain from numerical algorithms and is consistent with biological observations about the distribution of sensory cells' preferred stimuli.

1  Introduction

Populations of sensory neurons encode information about the external world through their spiking activity. To understand this encoding, it is natural to model it as an optimal or near-optimal code in the context of some task performed by higher brain regions, using performance criteria such as decoding error or motor performance. A Bayesian theory of neural decoding is useful to characterize optimal encoding, as the computation of performance criteria typically involves the posterior distribution of the world state conditioned on spiking activity.

We model the external world state as a random process, observed through a set of sensory neuron-like elements characterized by multidimensional tuning functions, representing the elements' average firing rate (see Figure 1). The actual firing of each cell is random and is given by a point process (PP) with rate determined by the external state and the cell's tuning function (Dayan & Abbott, 2005). Under this model, decoding of sensory spike trains may be formulated as a filtering problem based on PP observations, thus falling within the purview of nonlinear filtering theory (Snyder & Miller, 1991; Brémaud, 1981). Inferring the hidden state under such circumstances has been widely studied within the computational neuroscience literature (Dayan & Abbott, 2005; Macke, Buesing, & Sahani, 2015). Beyond neuroscience, PP-based filtering has been used for position sensing and tracking in optical communication (Snyder, Rhodes, & Hoversten, 1977), control of computer communication networks (Segall, 1978), queuing (Brémaud, 1981), and econometrics (Frey & Runggaldier, 2001).

Figure 1:

Problem setting.

Figure 1:

Problem setting.

A significant amount of work has been devoted in recent years to the development of algorithms for fast approximation of the posterior distribution, leading to an extensive literature (see Macke et al., 2015, for a recent review). Much of this work is devoted to the development of effective sampling techniques, leading to highly performing finite-dimensional filters that can be applied profitably to real neural data. These approaches are usually formulated in discrete time, as befits implementation on digital computers, and lead to complex mathematical expressions for the posterior distributions, which are difficult to interpret qualitatively. In this work, we are less concerned with algorithmic issues and more with establishing closed-form analytic expressions for approximately optimal continuous time filters and using these to characterize the nature of near-optimal encoders, namely, determining the structure and distribution of tuning functions for optimal state inference. A significant advantage of the closed-form expressions over purely numerical techniques is the insight and intuition that is gained from them about qualitative aspects of the system. Moreover, the leverage gained by the analytic computation contributes to reducing the variance inherent in Monte Carlo approaches. Thus, in this work, we do not compare our results to algorithmically oriented discrete-time filters but, rather, to other continuous-time analytically expressible filters for dynamically varying signals, with the aim of gaining insight into optimal decoding and encoding within an analytic framework.

The problem of filtering a continuous-time diffusion process through PP observations is solved formally under general conditions in Snyder (1972; see also Segall, 1976, and Solo, 2000), where a stochastic partial differential equation (PDE) for the infinite-dimensional posterior state distribution is derived. However, this PDE is intractable in general and not easily amenable to qualitative or even numerical analysis. Some previous work has derived exact or approximate finite-dimensional filters under various simplifying assumptions. In many of these works (e.g., Komaee, 2010; Rhodes & Snyder, 1977; Susemihl, Meir, & Opper, 2013; Twum-Danso & Brockett, 2001; Yaeli & Meir, 2010), the tuning functions are chosen so that the total firing rate (i.e., the sum of firing rates of all neurons) is independent of the state, an assumption we refer to as uniform coding (see Figure 2). Rhodes and Snyder (1977) derived an exact finite-dimensional filter for the case of linear dynamics with gaussian noise observed through uniform coding with gaussian tuning functions.1 Komaee (2010) considered the more general setting of uniform coding with arbitrary tuning functions, where an approximate filter is obtained.

Figure 2:

The uniform coding property, Σiλix=const, holds approximately in homogeneous populations with tuning function centers located on a dense uniform grid, as demonstrated in (a) a one-dimensional case. (b) Nonuniform coding.

Figure 2:

The uniform coding property, Σiλix=const, holds approximately in homogeneous populations with tuning function centers located on a dense uniform grid, as demonstrated in (a) a one-dimensional case. (b) Nonuniform coding.

Other work derives the posterior distribution for non-Markovian state dynamics, modeled as a gaussian processes. Huys, Zemel, Natarajan, and Dayan (2007) derive the posterior exactly, but its computation is not recursive, requiring memory of the entire spike history. A recursive version for gaussian processes with a Matérn kernel autocorrelation is derived in Susemihl, Meir, and Opper (2011). Both of these works assume uniform coding with gaussian tuning functions.

For reasons of mathematical tractability, few previous analytically oriented works studied neural decoding without the uniform coding assumption in spite of the experimental importance and relevance of nonuniform coding. We discuss such work in comparison to our work in section 6.

The problem of optimal encoding by neural populations has been studied mostly in the static case. A natural optimality criterion is the estimation mean square error (MSE). Some work (e.g., Harper & McAlpine, 2004; Ganguli & Simoncelli, 2014) optimizes Fisher information, which serves as a proxy to the MSE of unbiased estimators through the Cramér-Rao bound (Radhakrishna Rao, 1945) or, in the Bayesian setting, the Van Trees inequality (Gill & Levit, 1995). Fisher information of neural spiking activity is easy to compute analytically, at least in the static case (Dayan & Abbott, 2005), and it can be used without solving the decoding problem. This approach has been used to study nonuniform coding of static stimuli by heterogeneous populations in many works, including Chelaru and Dragoi (2008), Ecker, Berens, Tolias, and Bethge (2011), Ganguli and Simoncelli (2014). However, optimizing Fisher information may yield misleading qualitative results regarding the MSE-optimal encoding (Bethge, Rotermund, & Pawelzik, 2002; Pilarski & Pokora, 2015; Yaeli & Meir, 2010). Although under appropriate conditions, the inverse of Fisher information approaches the minimum attainable MSE in the limit of infinite decoding time, it may be a poor proxy for the MSE for finite decoding times, which are of particular importance in natural settings and control problems. Exact computation of the estimation MSE is possible in some restricted settings; some work along those lines is discussed in section 6.2.

A possible alternative is the computation of estimation MSE for a given filter through Monte Carlo simulations. This approach is complicated by high variability between trials, which means many trials are necessary for each value of the parameters. Consequently, optimization becomes very time-consuming, and possibly impractical, when using numerical filters such as particle filtering or large neural populations with many parameters.

In this work, we derive an approximate online filter for the neural decoding problem in continuous time and demonstrate its use in investigating optimal neural encoding. We consider neural populations characterized by a distribution over neuron parameters. Continuous distributions are used to approximate large populations with few parameters, resulting in a filter whose complexity does not grow with the population size and allowing optimization of population parameters rather than individual tuning functions. We suggest further reducing computational complexity for the encoding problem by using the estimated posterior variance as an approximation to estimation MSE, as discussed in appendix C.

Technically, given the intractable infinite-dimensional nature of the posterior distribution, we use a projection method replacing the full posterior at each point in time by a projection onto a simple family of distributions (gaussian in our case). This approach, originally developed in the filtering literature (Brigo, Hanzon, Le Gland, 1999; Maybeck, 1979) and termed assumed density filtering (ADF), has been successfully used more recently in machine learning (Opper, 1998; Minka, 2001). We derive approximate filters for gaussian tuning functions and for several distributions over tuning function centers, including the case of a finite population. These filters may be combined to obtain a filter for heterogeneous mixtures of homogeneous subpopulations. We are not aware of any previous work providing an effective closed-form filter for heterogeneous populations of sensory neurons characterized by a small number of parameters.

The main contributions of this letter are as follows:

  • Derivation of closed-form recursive expressions for the continuous-time posterior mean and variance within the ADF approximation, in the context of large nonuniform populations characterized by a small number of parameters

  • Demonstration of the quality of the ADF approximation by comparison to state-of-the-art particle filtering methods

  • Characterization of optimal adaptation (encoding) for sensory cells in a more general setting than hitherto considered (nonuniform coding, dynamic signals)

  • Demonstration of the interesting interplay between prior information and neuronal firing, showing how in certain situations, the absence of spikes can be highly informative (this phenomenon is absent under uniform coding).

Preliminary results discussed in this letter were presented at a conference (Harel, Meir, & Opper, 2015). These included only the special case of gaussian distribution of preferred stimuli. This letter provides a more general and rigorous formulation of the mathematical framework. By separately considering different terms in the approximate filter, we find that updates at spike time depend only on the tuning function of the spiking neuron and apply generally to any population distribution. For the updates between spikes, we provide closed-form expressions for cases that were not discussed in Harel et al. (2015)—namely, uniform populations on an interval and finite heterogeneous mixtures—and non-closed-form expressions for the general case, given as integrals involving the distribution of neuron parameters. We further supplement our previously published results with numerical evaluation of the filter's accuracy, an additional example application, and a detailed comparison with previous work.

2  Problem Overview

Consider a dynamical system with state XtRn, observed through the firing patterns of M sensory neurons, as illustrated in Figure 1. Each neuron fires stochastically and independently, with the ith neuron having firing rate λiXt. (More detailed assumptions about the dynamics of the state and observation processes are described in later sections.) In this context, we are interested in the question of optimal encoding and decoding. By decoding, we mean computing (exactly or approximately) the full posterior distribution of Xt given Nt, which is the history of neural spikes up to time t. The problem of optimal encoding is then the problem of optimal sensory cell configuration—finding the optimal rate function λi·i=1M so as to minimize some performance criterion. We assume the set λii=1M belongs to some parameterized family with parameter ϕ.

To quantify the performance of the encoding-decoding system, we summarize the result of decoding using a single estimator X^t=X^tNt, and define the mean square error (MSE) as ttrace[(Xt-X^t)(Xt-X^t)T]. We seek X^t and ϕ that solve minϕminX^tEt=minϕE[minX^tE[t|Nt]]. The inner minimization problem in this equation is solved by the MSE-optimal decoder, which is the posterior mean X^t=μtEXt|Nt. The posterior mean may be computed from the full posterior obtained by decoding. The outer minimization problem is solved by the optimal encoder. If decoding is exact, the problem of optimal encoding becomes that of minimizing the expected posterior variance. Note that although we assume a fixed parameter ϕ that does not depend on time, the optimal value of ϕ for which the minimum is obtained generally depends on the time t where the error is to be minimized. In principle, the encoding/decoding problem can be solved for any value of t. In order to assess performance, it is convenient to consider the steady-state limit t for the encoding problem.

Below, we approximately solve the decoding problem for any t. We then explore the problem of choosing the optimal encoding parameters ϕ using Monte Carlo simulations in examples motivated by experimental results.

Having an efficient (closed-form) approximate filter allows performing the Monte Carlo simulation at a significantly reduced computational cost relative to numerical methods such as particle filtering. The computational cost is further reduced by averaging the computed posterior variance across trials rather than the squared error, thereby requiring fewer trials. The mean of the posterior variance equals the MSE (of the posterior mean) but has the advantage of being less noisy than the squared error itself, since by definition, it is the mean of the square error under conditioning on Nt.

3  Decoding

3.1  A Finite Population of Gaussian Neurons

For ease of exposition, we first formulate the problem for a finite population of neurons. We address a more general setting in subsequent sections.

3.1.1  State and Observation Model

The observed process X is a diffusion process obeying the stochastic differential equation (SDE),2
dXt=AXtdt+DXtdWt,t0,
(3.1)
where A·,D· are arbitrary functions such that equation 3.1 has a unique solution3 and Wt is a standard Wiener process whose increments are independent of the history of all other random processes. The integral with respect to dWt is to be interpreted in the Ito sense. The initial condition X0 is assumed to have a continuous distribution with a known density.
The observation processes are {Ni}i=1M, where Nti is the spike count of the ith neuron up to time t. Denote by Nt=(N0,ti)i=1M the history of neural spikes up to time t and by Nt=i=1MNti the total number of spikes up to time t from all neurons. We assume that the ith neuron fires with rate λiXt at time t, independent of other neurons given the state history X0,t. More explicitly, this means
PNt+hi-Nti=k|X0,t,Nt=λiXth+ohk=01-λiXth+ohk=1ohk>1,i=1,M,h0+
(3.2)
where X0,t denotes the history up to time t of X and oh is little-o asymptotic notation, denoting any function satisfying oh/h0 as h0+. Thus, each Ni is a doubly stochastic poisson process (DSPP; see, e.g., Snyder & Miller, 1991) with rate process λiXt.4
To achieve mathematical tractability, we assume that the tuning functions λi are gaussian. The firing rate of the ith neuron in response to state x is given by
λix=hiexp-12Hix-θiRi2,
(3.3)
where θiRn is the neuron's preferred location, hiR+ is the neuron's maximal expected firing rate, HiRm×n and RiRm×m, mn, are fixed matrices, each Ri is positive definite, and the notation yM2 denotes yTMy. The inclusion of the matrix Hi allows using high-dimensional models where only some dimensions are observed, for example, when the full state includes velocities but only locations are directly observable. In typical applications, Hi would be the same across all neurons or at least across all neurons of the same sensory modality.
In the sequel, we use the following standard notation,
abhtdNtij1tjia,bhtji,
(3.4)
for any function h, where tji is the time of the jth point of the process Ni. This is the usual Lebesgue integral of h with respect to Ni viewed as a discrete measure.

3.1.2  Model Limitations

The model outlined above involves several simplifications to achieve tractability. Namely, tuning functions are assumed to be gaussian, and firing rates are assumed to be independent of state history and spike history given the current state (see equation 3.2), yielding a DSPP model (see note 4).

Gaussian tuning functions are a reasonable model for some neural systems but inadequate for others—for example, where the tuning is sigmoidal or there is a baseline firing rate regardless of stimulus value. For simplicity, we focus on the gaussian case in this work. It is straightforward to extend the derivation presented here to piecewise linear tuning, which may be used to represent sigmoidal tuning functions, but the resulting expression are more cumbersome. We have also developed closed-form results for tuning functions given by sums of gaussians; however, these require further approximations in order to obtain analytic results and are not discussed in this letter.

The assumption of history-independent rates may also limit the model's applicability. Real sensory neurons exhibit firing history dependence in the form of refractory periods and rate adaptation (Dayan & Abbott, 2005), state history dependence such as input integration (Dayan & Abbott, 2005), or correlations between the firing of different neurons conditioned on the state (Pillow et al., 2008). These phenomena are captured by some encoding models, such as simple integrate-and-fire models, as well as more complex physiological models like the Hodgkin-Huxley model. However, the simplifying independence assumptions above are common to all work presenting closed-form, continuous-time filters for point-process observations that we are aware of.

Note that characterization of the point processes in terms of their history-conditioned firing rate, as opposed to finite-dimensional distributions, does not in itself restrict the model's generality in any substantial way (see Segall & Kailath, 1975, theorem 1). Rather, the independence assumptions are expressed rigorously by the fact that the right-hand side of equation 3.2 depends on neither previous values of X nor previous spike times of any neuron. Some of our analysis applies without modification when rates are allowed to depend on spiking history (specifically, equations 3.6), so it may be possible to extend these techniques to some history-dependent models. However, when rates may depend on the state history, exact filtering may involve the posterior distribution of the entire state history rather than the current state, so that a different approach is probably required.

3.1.3  Exact Filtering Equations

Let ptN· be the posterior density of Xt given the firing history Nt, and EtN· the posterior expectation given Nt. The prior density p0N is assumed to be known. We denote by λ^ti the rate of Nti with respect to the history of spiking only—the rates that would appear on the right-hand side of equation 3.2 if the conditioning on the left were only on Nt. These rates are given by5
λ^ti=EtNλiXt=ptNxλixdx.
The problem of filtering a diffusion process X from a doubly stochastic Poisson process driven by X is formally solved in Snyder (1972), where the authors derive a stochastic PDE for the posterior density6
dptNx=L*ptNxdt+ptNxiλixλ^ti-1dNti-λ^tidt,
(3.5)
where L is the state's infinitesimal generator (Kolmogorov's backward operator), defined as Lhx=limΔt0+EhXt+Δt|Xt=x-hx/Δt, L* is L's adjoint operator (Kolmogorov's forward operator). The notation dNti is interpreted as in equation 3.4, so this term contributes a jump of size ptNxλix/λ^ti-1 at a spike of the ith neuron. Equation 3.5 may be written in a notation more familiar for nonstochastic PDEs using Dirac delta functions,
tptNx=L*ptNx+ptNxiλixλ^ti-1N˙ti-λ^ti,
where N˙tijδ(t-tji) is the spike train of the ith neuron, which is the formal derivative of the process Ni. An accessible, albeit nonrigorous, derivation of equation 3.5 via time discretization is found in Susemihl (2014, sect. 2.3).

The stochastic PDE, equation 3.5, is nonlinear and nonlocal (due to the dependence of λ^ti on ptN) and therefore usually intractable. Rhodes and Snyder (1977) and Susemihl et al. (2014) consider linear dynamics with a gaussian prior and gaussian sensors with centers distributed uniformly over the state space. In this case, the posterior is gaussian, and equation 3.5 leads to closed-form ordinary differential equations (ODEs) for its mean and variance. In our more general setting, we can obtain exact equations for the posterior mean and variance, as follows.

Let μtEtNXt,X~tXt-μt,ΣtEtN[X~tX~tT]. Using equation 3.5, along with known results about the form of the infinitesimal generator Lt for diffusion processes (e.g., Øksendal, 2003, theorem 7.3.3), the first two posterior moments can be shown to obey the following exact equations (see appendix A):
dμt=EtNAXtdt+iEt-Nωt-iXt-dNti-λ^tidt,
(3.6a)
dΣt=EtNAXtX~t+X~tAXt+DXtDXtdt+iEt-Nωt-iX~t-X~t-dNti-λ^tidt-iEt-Nωt-iXt-Et-Nωt-iXt-dNti,
(3.6b)
where
ωtiλiXtλ^ti-1,
and the expressions involving t- denote left limits, which are necessary since the solutions to equation 3.6 are discontinuous at spike times.

In contrast with the more familiar case of linear dynamics with gaussian white noise and the corresponding Kalman-Bucy filter (Maybeck, 1979), here the posterior variance is random and is generally not monotonically decreasing even when estimating a constant state. However, noting that E[dNti-λ^tidt]=0, we may observe from equation 3.6b that for a constant state (A=D=0), the expected posterior variance EΣt is decreasing, since the first two terms in equation 3.6b vanish.

We will find it useful to rewrite equation 3.6b in a different form, as follows,
dμt=dμtπ+dμtc+dμtN,dΣt=dΣtπ+dΣtc+dΣtN,
where dμtπ,dΣtπ are the prior terms, corresponding to L*ptNx in equation 3.5, and the remaining terms are divided into continuous update terms dμtc,dΣtc (multiplying dt) and discontinuous update terms dμtN,dΣtN (multiplying dNti). Using equation 3.6, we find the exact equations:
dμtπ=EtNAXtdt,
(3.7a)
dΣtπ=EtNAXtX~t+X~tAXt+DXtDXtdt,
(3.7b)
dμtc=-iEtNωtiXtλ^idt,
(3.7c)
dΣtc=-iEtNωtiX~tX~tλ^idt,
(3.7d)
dμtN=iEt-Nωt-iXt-dNti,
(3.7e)
dΣtN=iEt-Nωt-iX~t-X~t--Et-Nωt-iXt-Et-Nωt-iXt-dNti,
(3.7f)
The prior terms dμtπ,dΣtπ represent the known dynamics of X and are the same terms appearing in the Kalman-Bucy filter. These would be the only terms left if no measurements were available and would vanish for a static state. The continuous update terms dμtc,dΣtc represent updates to the posterior between spikes that are not derived from X's dynamics and therefore may be interpreted as corresponding to information obtained from the absence of spikes. The discontinuous update terms dμtN,dΣtN contribute a change to the posterior at spike times depending on the spike's origin i and thus represent information obtained from the presence of a spike as well as the parameters of the spiking neuron.

Note that the gaussian tuning assumption 3.3 has not been used in this section, and equations 3.7 are valid for any form of λi.

3.1.4  ADF Approximation

While equations 3.7 are exact, they are not practical, since they require computation of posterior expectations EtN·. To bring them to a closed form, we use ADF with an assumed gaussian density (see Opper, 1998 for details). Informally, this may be envisioned as integrating equation 3.7, while replacing the distribution ptN by its approximating gaussian “at each time step.” The approximating gaussian is obtained by matching the first two moments of ptN (Opper, 1998). Note that the solution of the resulting equations does not in general match the first two moments of the exact solution, though it may approximate it. Practically, the ADF approximation amounts to substituting the normal distribution N(μt,Σt) for ptN to compute the expectations in equation 3.7. This heuristic may be justified by its relation to a projection method, where the right-hand side of the density PDE is projected onto the tangent space of the approximating family of densities: the two approaches are equivalent when the approximating family is exponential (Brigo et al., 1999).

If the dynamics are linear, the prior updates, equations 3.7a and 3.7b, are easily computed in closed form after this substitution. Specifically, for dXt=AXtdt+DdWt, the prior updates read
dμtπ=Aμtdt,
(3.8a)
dΣtπ=AΣt+ΣtA+DDdt,
(3.8b)
as in the Kalman-Bucy filter (Maybeck, 1979). Otherwise, they may be approximated by expanding the nonlinear functions A(x) and DxDx as power series and applying the assumed gaussian density, resulting in tractable gaussian integrals. The use of ADF in the prior terms is outside the scope of this work (see, e.g., Maybeck, 1979, chap. 12).

We therefore turn to the approximation of the nonprior updates, equations 3.7c to 3.7f, in the case of gaussian tuning, equation 3.3.

Abusing notation, from here on, we use μt,Σt, and ptNx to refer to the ADF approximation rather than to the exact values. Applying the gaussian ADF approximation ptNxNx;μt,Σt in the case of gaussian tuning functions 3.3 yields the nonprior terms
dμtc=iΣtHiStiδtiλ^tidt,
(3.9a)
dΣtc=iΣtHiSti-StiδtiδtiStiHiΣtλ^tidt,
(3.9b)
dμtN=-iΣt-HiSt-iδt-idNti,
(3.9c)
dΣtN=-iΣt-HiSt-iHiΣt-dNti,
(3.9d)
λ^ti=hiStiRiexp-12δtiSti2,
(3.9e)
where
δtiHiμt-θi,StiRi-1+HΣtH-1.
(3.9f)
These equations are a special case of equation 3.16, which are derived in appendix A. The updates for the posterior precision Σt-1 have a simpler form, also derived in appendix A:
dΣt-1,c=-iHiSti-StiδtiδtiStiHiλ^tidt,
(3.10a)
dΣt-1,N=iHiRiHidNti.
(3.10b)
In the scalar case m=n=1, with H=1, σt2=Σt,αi2=Ri-1, the update equations 3.9 and 3.10 read
dμtc=iσt2σt2+αi2μt-θiλ^tidNti,
(3.11a)
dσt2,c=iσt2σt2+αi21-μt-θi2σt2+αi2σt2λ^tidt,
(3.11b)
dσt-2,c=i1σt2+αi2μt-θi2σt2+αi2-1λ^tidt,
(3.11c)
dμtN=iσt-2σt-2+αi2θi-μt-dNti,
(3.11d)
dσt2,N=-iσt-2σt-2+αi2σt-2dNti,
(3.11e)
dσt-2,N=iαi-2dNti
(3.11f)
λ^ti=hi2παi2Nμt;θi,σt2+αi2.
(3.11g)
Figure 3 illustrates the filter 3.11 in a one-dimensional example.
Figure 3:

Filtering example with two sensory neurons. A dynamic state dXt=-Xtdt+dWt is observed by two gaussian neurons, equation 3.3, with preferred stimuli θ1=-1.2,θ2=1.2, and filtered according to equations 3.8 and 3.9. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus θ, equal to ±1.2 in this case. The approximate posterior mean μt is shown in blue, with the posterior standard deviation σt in the shaded areas. The curves to the right of the graph show the tuning functions of the two neurons. The bottom graph shows the posterior precision σt-2. Neuron parameters are θ1=-1.2,θ2=1.2,h1=10,h2=5,Hi=1,Ri-1=αi2=0.5i=1,2. The process and the filter were both initialized from the steady-state distribution of the dynamics, which is N0,0.5. The dynamics were discretized with time step Δt=10-3.

Figure 3:

Filtering example with two sensory neurons. A dynamic state dXt=-Xtdt+dWt is observed by two gaussian neurons, equation 3.3, with preferred stimuli θ1=-1.2,θ2=1.2, and filtered according to equations 3.8 and 3.9. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus θ, equal to ±1.2 in this case. The approximate posterior mean μt is shown in blue, with the posterior standard deviation σt in the shaded areas. The curves to the right of the graph show the tuning functions of the two neurons. The bottom graph shows the posterior precision σt-2. Neuron parameters are θ1=-1.2,θ2=1.2,h1=10,h2=5,Hi=1,Ri-1=αi2=0.5i=1,2. The process and the filter were both initialized from the steady-state distribution of the dynamics, which is N0,0.5. The dynamics were discretized with time step Δt=10-3.

3.1.5  Interpretation

To gain some insight into the filtering equations, we consider the discontinuous updates, equations 3.9c and 3.9d, and continuous updates, equations 3.9a and 3.9b, in some special cases, in reference to the example presented in Figure 3.

Discontinuous updates. Consider the case Hi=I. As seen from the discontinuous update equations 3.9c and 3.9d, when the ith neuron spikes, the posterior mean moves toward its preferred location θi, and the posterior variance decreases (in the sense that Σt+-Σt- is negative definite), as seen in Figure 3. Neither update depends on hi.

For general HiRm×n of full row rank, let Hir be any right inverse of Hi and θi=Hirθi. Note that Hi projects the state Xt to the coordinates employed by the ith neuron, which we refer to as perceptual coordinates; thus, θi may be interpreted as the tuning function center in state coordinates, whereas θi is in the neuron's perceptual coordinates. We may rewrite equation 3.3 in state coordinates as
λix=hiexp-12x-θiHiRiHi2.
Now, the updates for a spike of neuron i at time t can be written more intuitively (see appendix A) as
μt+=Σt--1+HiRiHi-1Σt--1μt-+HiRiθi=Σt--1+HiRiHi-1Σt--1μt-+HiRiHiθi,Σt+-1=Σt--1+HiRiHi.
(3.12)
Thus, the new posterior mean is a weighted average of the prespike posterior mean and the preferred stimulus in state coordinates. The posterior precision Σt-1 increases by HiRiHi, which is the tuning function precision matrix in state coordinates. This may be observed in Figure 3, where the posterior precision increases at each spike time by the fixed amount Ri=αi-2=2.
Continuous updates. The continuous mean update equation 3.9a, contributing between spiking events, also admits an intuitive interpretation in the case where all neurons share the same shape matrices Hi=H,Ri=R. In this case, the equation reads
dμtc=ΣtHR-1+HΣtH-1Hμt-iθiνtiiλ^tidt,
where νtiλ^ti/jλ^tj. The normalized rates νti may be interpreted heuristically as the distribution of the next firing neuron's index i, provided the next spike occurs immediately (Brémaud, 1981, theorem T15). Thus, the absence of spikes drives the posterior mean away from the expected preferred stimulus of the next spiking neuron. The strength of this effect scales with iλ^ti, which is the total expected rate of spikes given the firing history. This behavior is qualitatively similar to the result obtained in Bobrowski, Meir, and Eldar (2009) for a finite population of neurons observing a continuous-time finite-state Markov process, where the posterior probability between spikes concentrates on states with lower total firing rate.

This behavior may be observed in Figure 3. When the posterior mean μt is near a neuron's preferred stimulus, it moves away from it between spikes as the next spike is expected from that neuron. Similarly, despite the symmetry of the two neurons' preferred stimuli relative to the starting estimate μ0, the posterior mean shifts at the start of the trial toward the preferred stimulus of the second neuron due to its lower firing rate.

The continuous variance update, equation 3.9a, consists of the difference of two positive semidefinite terms and, accordingly, the posterior variance may increase or decrease between spikes along various directions. In Figure 3, the posterior variance decreases before the first spike, and increases between spikes afterward.

3.2  Continuous Population Approximation

3.2.1  Motivation

The filtering equations 3.9a to 3.9f implement sensory decoding for nonuniform populations. However, their applicability to studying neural encoding in large heterogeneous populations is limited for two closely related reasons. First, the computational cost of the filter is linear in the number of neurons. Second, the size of the parameter space describing the population is also linear in the number of neurons, making optimization of large populations computationally costly. These traits are shared with other filters designed for heterogeneous populations, namely Bobrowski et al. (2009), Eden and Brown (2008), and Twum-Danso and Brockett (2001).

To reduce the parameter space and simplify the filter, we approximate large neural populations by an infinite continuous population, characterized by a distribution over neuron parameters. These distributions are described by few parameters, resulting in a filter whose complexity does not grow with the population size and allowing optimization of population parameters rather than individual tuning functions.

For example, consider a homogeneous population of neurons with preferred stimuli equally spaced on an interval, as depicted in Figure 4a. If the population is large, its firing pattern statistics may be modeled as an infinite population of neurons, with preferred stimuli uniformly distributed on the same interval, as in Figure 4c. In this continuous population model, each spike is characterized by the preferred stimulus of the firing neuron, which is a continuous variable, rather than by the neuron's index. Such a population is parameterized by only two variables representing the end points of the interval, in addition to the tuning function height and width parameters. There is no need for a parameter representing the density of neurons on the interval, as scaling the density of neurons is equivalent to identical scaling of each neuron's maximum firing rate.

Figure 4:

Finite (discrete) and infinite (continuous) population models. (a) A finite population of six neurons with preferred stimuli θi (i=0,,5). The tuning functions are depicted on the left. The spiking pattern of each neuron in response to the stimulus Xt (orange line) is represented by a separate point process Nti (blue vertical ticks). (b) A representation of the same neural response as a single marked point process. Each blue dot represents a spike with the vertical location representing its mark θ. (c) A continuous population model approximating the statistics of the same population. The approximation would be better for larger populations. The population density fdθ/dθ is depicted to the left of the plot, and the tuning function corresponding to one of the spikes on the right.

Figure 4:

Finite (discrete) and infinite (continuous) population models. (a) A finite population of six neurons with preferred stimuli θi (i=0,,5). The tuning functions are depicted on the left. The spiking pattern of each neuron in response to the stimulus Xt (orange line) is represented by a separate point process Nti (blue vertical ticks). (b) A representation of the same neural response as a single marked point process. Each blue dot represents a spike with the vertical location representing its mark θ. (c) A continuous population model approximating the statistics of the same population. The approximation would be better for larger populations. The population density fdθ/dθ is depicted to the left of the plot, and the tuning function corresponding to one of the spikes on the right.

3.2.2  Marked Point Processes as Continuous Population Models

We now change the mathematical formulation and notation of our model to accommodate parameterized continuous populations. The new formulation is more general and includes finite populations as a special case. Rather than using a sequence of tuning function parameters—such as (yi)i=1M=(hi,θi,Hi,Ri)i=1M in the case of gaussian neurons, equation 3.3—we characterize the population by a measure fdy, where fY counts the neurons with parameters in a set Y, up to some multiplicative constant. A continuous measure f may be used to approximate a large population. Accordingly, we write λx;y for the tuning function of a neuron with parameters y, in lieu of the previous notation λix. For example, the gaussian case, equation 3.3, takes the form
λx;y=λx;h,θ,H,Rhexp-12Hx-θR2,
(3.13)
with y=h,θ,H,R the parameters of the gaussian tuning function.
Similarly, instead of the observation processes {Ni}i=1M counting the spikes of each neuron, we describe the spikes of all neurons using a single marked point-process N (Snyder & Miller, 1991), which is a random sequence of pairs tk,yk, where tk[0,) is the time of the kth point and ykY its mark. In our case, tk is the kth spike time, and ykY are the parameters of the spiking neuron. Alternatively, N may be described as a random discrete measure, where Ns,t×Y is the number of spikes in the time interval s,t from neurons with parameters in the set Y. In line with the discrete measure view, we write, for an arbitrary function h,
s,t×Yhτ,yNdτ,dy=k1stkt,ykYhtk,yk,
which is the ordinary Lebesgue integral of h with respect to the discrete measure N. We use the notation NtYN[0,t]×Y, and when Y=Y, we omit it and write Nt for the total number of spikes up to time t. As before, Nt denotes the history up to time t—here, including both spike times and marks.

Figure 4 illustrates how the activity of a neural population may be represented as a marked point process and how the firing statistics are approximated by a continuous population. In this case, a homogeneous population of neurons with equally spaced preferred stimuli is approximated by a continuous population with uniformly distributed preferred stimuli.

To characterize the statistics of N, we first consider the finite population case. In this case, f=iδyi where δyi is the point mass at yi, and the rate of points with marks in a set YY at time t (conditioned on Nt,X0,t) is
i:yiYλXt;yi=YλXt;yfdy.
In the case of a general population distribution f, we similarly take the integral YλXt;yfdy as the rate (or intensity) of points with marks in Y at time t conditioned on (Nt,X0,t). The random measure λXt;yfdy appearing in this integral is termed the intensity kernel of the marked point-process N with respect to the history (Nt,X0,t) (e.g., Brémaud, 1981). The dynamics of N may be described heuristically by means of the intensity kernel as
ENdt,dy|X0,t,Nt=λXt;yfdydt.
(3.14)
The intensity kernel with respect to Nt alone is given by
ENdt,dy|Nt=λ^tyfdydt,
(3.15)
where
λ^tyEλXt;y|Nt=EtNλXt;y.
We denote the rate of the unmarked process Nt with respect to Nt (i.e., the total posterior expected firing rate) by
λ^tfλ^tyfdy.

3.2.3  Filtering

Assume gaussian tuning functions, equation 3.13. Writing yh,θ,H,R, the filtering equations 3.9a to 3.9e take the form
dμtc=YΣtHStH,RδtH,θλ^tyfdydt
(3.16a)
dΣtc=YΣtHStH,R-StH,RδtH,θ(δtH,θ)StH,RHΣtλ^tyfdydt
(3.16b)
dμtN=YΣt-HSt-H,Rδt-H,θNdt,dy
(3.16c)
dΣtN=-YΣt-HSt-H,RHΣt-Ndt,dy
(3.16d)
δtH,θHμt-θ,
(3.16e)
StH,RR-1+HΣtH-1,
(3.16f)
λ^tyλ^th,θ,H,R=hStH,RRexp-12θ-HμtStH,R2,
(3.16g)
and the posterior precision updates, equations 3.10a and 3.10b, become
dΣt-1,c=-HStH,R-StH,RδtH,θ(δtH,θ)StH,RHλ^tyfdydt,
(3.16h)
dΣt-1,N=YHRHNdt,dy.
(3.16i)
The derivations of these equations, as well as filtering equations for special cases considered in this section, are in appendix A.

Using equation 3.16g, we may evaluate the continuous update equations 3.16a and 3.16b in closed form for some specific forms of the population distribution f. Note that the discontinuous update equations 3.16c and 3.16d do not depend on f and are already in closed form. We now consider several population distributions where the continuous updates may be brought to closed form.

Single neuron. The result for a single neuron with parameters h,θ,H,R is trivial to obtain from equations 3.16a and 3.16b, yielding
dμtc=ΣtHStH,RHμt-θλ^tfdt,
(3.17a)
dΣtc=Σt-HStH,R-StH,RHμt-θHμt-θStH,Rλ^tfHΣt-dt,
(3.17b)
where λ^tf=λ^ty as given by 3.16g, and StH,R is defined in equation 3.16f.
Uniform population. Here all neurons share the same height h and shape matrices H,R, whereas the location parameter θ covers Rm uniformly, fdh',dθ,dH',dR'=δhdh'δHdH'δRdR'dθ, where δx is a Dirac measure at x, δxA=1{xA}, and dθ indicates the Lebesgue measure in the parameter θ. A straightforward calculation from equations 3.16a, 3.16b, and 3.16g yields
dμtc=0,dΣtc=0,
(3.18)
in agreement with the (exact) result obtained in the uniform coding setting of Rhodes and Snyder (1977), where the filtering equations include only the prior term and the discontinuous update term.
Gaussian population. As in the uniform population case, we assume all neurons share the same height h and shape matrices H,R and differ only in the location parameter θ. Abusing notation slightly, we write f(dh',dθ,dH',dR')=δh(dh')δH(dH')δR(dR')f(dθ) where the preferred stimuli are normally distributed,
fdθ=Nθ;c,Σpopdθ=2π-n/2Σpop-1/2exp-12θ-cΣpop-12dθ,
(3.19)
for fixed cRm, and positive-definite Σpop.

We take f to be normalized, since any scaling of f may be included in the coefficient h in equation 3.13, resulting in the same point process. Thus, when used to approximate a large population, the coefficient h would be proportional to the number of neurons.

The continuous updates for this case read
dμtc=ΣtHZtH,RHμt-cλ^tfdt,
(3.20a)
dΣtc=ΣtHZtH,R-ZtH,RHμt-cHμt-cZtH,RHΣtλ^tfdt,
(3.20b)
where
ZtH,RΣpop+R-1+HΣtH-1,
(3.20c)
λ^tfλ^tyfdy=hZtH,RRexp-12Hμt-cZtH,R2.
(3.20d)
These updates generalize the single-neuron updates, equation 3.17, with the population center c taking the place of the location parameter θ and ZtH,R substituting StH,R. The single-neuron case is obtained when Σpop=0.
It is illustrative to consider these equations in the scalar case m=n=1, with H=1. Letting σt2=Σt,α2=R-1,σpop2=Σpop yields
dμtc=σt2σt2+α2+σpop2μt-cλ^tfdt,
(3.21a)
dσt2,c=σt2σt2+α2+σpop21-μt-c2σt2+α2+σpop2σt2λ^tfdt,
(3.21b)
where
λ^tf=h2πα2Nμt;c,σt2+α2+σpop2.

Figure 5a demonstrates the continuous update terms, equation 3.21, as a function of the current mean estimate μt, for various values of the population variance σpop2, including the case of a single neuron, σpop2=0. The continuous update term dμtc pushes the posterior mean μt away from the population center c in the absence of spikes. This effect weakens as μt-c grows due to the factor λ^tf, consistent with the idea that far from c, the lack of events is less surprising, hence less informative. The continuous variance update term dσt2,c increases the variance when μt is near θ, otherwise decreases it. This stands in contrast with the Kalman-Bucy filter, where the posterior variance cannot increase when estimating a static state.

Figure 5:

Continuous update terms as a function of the current posterior mean estimate, for a 1D state observed through a population of gaussian neurons, equation 3.3. The population density is gaussian on the left plot and uniform on the interval -1,1 on the right plot. The bottom plots show the population density fdθ/dθ and a tuning function λx;θ for θ=0.

Figure 5:

Continuous update terms as a function of the current posterior mean estimate, for a 1D state observed through a population of gaussian neurons, equation 3.3. The population density is gaussian on the left plot and uniform on the interval -1,1 on the right plot. The bottom plots show the population density fdθ/dθ and a tuning function λx;θ for θ=0.

3.2.4  Uniform Population on an Interval

In this case, we assume a scalar state, n=m=1, and
fdθ=1a,bθdθ,
(3.22)
where similar to the gaussian population case, h and R are fixed. Unlike the gaussian case, here we find it more convenient not to normalize the distribution. Since the state is assumed to be scalar, let σt2=Σt,α2=R-1. The continuous updates for this case are
dμtc=h2πα2σt2σt2+α2ϕbt'-ϕat'σtdt,
(3.23a)
dσt2,c=h2πα2σt2σt2+α2bt'ϕbt'-at'ϕat'σt2dt,
(3.23b)
where
at'a-μtσt2+α2,bt'b-μtσt2+α2,ϕxNx;0,1,
and the posterior rates are given by
λ^tθ=h2πα2Nθ;μt,α2+σt2,λ^tf=h2πα2at'bt'ϕ.

Figure 5b demonstrates the continuous update terms, equation 3.23, as a function of the current mean estimate μt. When the mean estimate is around an end point of the interval, the mean update μtc pushes the posterior mean outside the interval in the absence of spikes. The posterior variance σt2 decreases outside the interval, where the absence of spikes is expected, and increases inside the interval, where it is unexpected.7 When the posterior mean is not near the interval end points, the updates are near zero, consistent with the uniform population case, equation 3.18.

Finite Mixtures. Note that the continuous updates, equations 3.16a and 3.16b, are linear in f. Accordingly, if fdy=iαifidy, where each fi is of one of the above forms, the updates are obtained by the appropriate weighted sums of the filters derived above for the various special forms of fi. This form is quite general: it includes populations where θ is distributed according to a gaussian mixture, as well as heterogeneous populations with finitely many different values of the shape matrices H,R. The resulting filter includes a term for each component of the mixture.

4  Numerical evaluation

Since the filter, equation 3.16, is based on an assumed density approximation, its results may be inexact. We tested the accuracy of the filter in the gaussian population case (see equation 3.20), by numerical comparison with particle filtering (PF; Doucet & Johansen, 2009).

Figure 6 shows two examples of filtering a one-dimensional process observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3), using both the ADF approximation (see equation 3.20) and a PF for comparison. (See the figure caption for details.) Figure 7 shows the distribution of approximation errors and the deviation of the posterior from gaussian. The approximation errors plotted are the relative error in the mean estimate μμADF-μPF/σPF and the error in the posterior standard deviation estimate σσADF-σPF/σPF, where μADF,μPF,σADF,σPF are, respectively, the posterior mean obtained from ADF and PF and the posterior standard deviation obtained from ADF and PF. The deviation of the posterior distribution from gaussian is quantified using the Kolmogorov-Smirnov (KS) statistic supxFx-Gx where F is the particle distribution cdf and G is the cdf of a gaussian matching F's first two moments. For comparison, the orange lines in Figure 7 show the distribution of this KS statistic under the hypothesis that the particles are drawn independently from a gaussian, which is known as the Lilliefors distribution (see Lilliefors, 1967). As seen in the figure, the gaussian posterior assumption underlying the ADF approximation is quite accurate despite the fact that the population is nonuniform. Accordingly, approximation errors are typically of a few percent (see Table 1).

Figure 6:

Two examples of a linear one-dimensional process observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3), filtered using the ADF approximation (see equations 3.8, 3.16a, 3.16b, and 3.20), and using a particle filter. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus θ. The approximate posterior means obtained from ADF and particle filtering are shown in blue and red, respectively, with the corresponding posterior standard deviations in shaded areas of the respective colors. The curves to the right of the graph show the preferred stimulus density (black), and a neuron's tuning function, equation 3.3, centered at θ=0 (gray), arbitrarily normalized to the same height for visualization. The bottom graph shows the posterior variance. Parameters used in both examples: a=-0.1,d=1,H=1,σpop2=4,R-1=0.25,c=0,μ0=0,σ02=1 (note the different extent of the time axes). The observed processes were initialized from their steady-state distribution. The dynamics were discretized with time step Δt=10-3. The particle filter uses 1000 particles with systematic resampling (see, e.g., Doucet & Johansen, 2009) at each time step.

Figure 6:

Two examples of a linear one-dimensional process observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3), filtered using the ADF approximation (see equations 3.8, 3.16a, 3.16b, and 3.20), and using a particle filter. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus θ. The approximate posterior means obtained from ADF and particle filtering are shown in blue and red, respectively, with the corresponding posterior standard deviations in shaded areas of the respective colors. The curves to the right of the graph show the preferred stimulus density (black), and a neuron's tuning function, equation 3.3, centered at θ=0 (gray), arbitrarily normalized to the same height for visualization. The bottom graph shows the posterior variance. Parameters used in both examples: a=-0.1,d=1,H=1,σpop2=4,R-1=0.25,c=0,μ0=0,σ02=1 (note the different extent of the time axes). The observed processes were initialized from their steady-state distribution. The dynamics were discretized with time step Δt=10-3. The particle filter uses 1000 particles with systematic resampling (see, e.g., Doucet & Johansen, 2009) at each time step.

Figure 7:

Two-dimensional histograms of approximation errors (relative to particle filter) versus KS statistic of particle distribution, with color indicating the count of time steps falling within each two-dimensional bin. The KS statistic is plotted against the estimation errors μ, σ (see main text). For clearer visualization, trials were omitted from the 2D histograms where the respective errors are below their 0.5 percentile value or over their 99.5 percentile value, as well as trials where the KS statistic is over its 99.5 percentile value. The 1D histograms above and to the right of each plot show the distribution of the KS statistic, of μ and of σ. Orange lines represent the theoretical distribution of the KS statistic for gaussian data (Lilliefors distribution), obtained via Monte Carlo simulation. Data were collected from 100 trials of 1000 time steps with step size Δt=10-3, using 10,000 particles with systematic resampling at each time step.

Figure 7:

Two-dimensional histograms of approximation errors (relative to particle filter) versus KS statistic of particle distribution, with color indicating the count of time steps falling within each two-dimensional bin. The KS statistic is plotted against the estimation errors μ, σ (see main text). For clearer visualization, trials were omitted from the 2D histograms where the respective errors are below their 0.5 percentile value or over their 99.5 percentile value, as well as trials where the KS statistic is over its 99.5 percentile value. The 1D histograms above and to the right of each plot show the distribution of the KS statistic, of μ and of σ. Orange lines represent the theoretical distribution of the KS statistic for gaussian data (Lilliefors distribution), obtained via Monte Carlo simulation. Data were collected from 100 trials of 1000 time steps with step size Δt=10-3, using 10,000 particles with systematic resampling at each time step.

Figure 8 shows an example of filtering a two-dimensional process with dynamics
dXt=010-0.1Xtdt+01dWt,
(4.1)
which may be interpreted as the position and velocity of a particle subject to friction proportional to its velocity, as well as a Gaussian white noise external force. In this example, only the position is directly observed by the neural population. (Additional details are given in the figure caption.) The distribution of approximation errors and the KS statistic in this two-dimensional setting is shown in Figure 9. The approximation errors plotted are μ and σ as defined above; both these errors and the KS statistic are computed separately for each dimension.
Figure 8:

An example two-dimensional process (see equation 4.1) observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3; also see Figure 6). The sensory population is gaussian (see equations 3.3 and 3.19), with H=1,0, so only the particle position is directly observed. Other sensory parameters are α2=R-1=0.25,h=10,Σpop=4. The process was initialized from N[0,0],I2 where I2 is the 2×2 identity matrix. Other details are as in Figure 6.

Figure 8:

An example two-dimensional process (see equation 4.1) observed through a gaussian population (see equation 3.19) of gaussian neurons (see equation 3.3; also see Figure 6). The sensory population is gaussian (see equations 3.3 and 3.19), with H=1,0, so only the particle position is directly observed. Other sensory parameters are α2=R-1=0.25,h=10,Σpop=4. The process was initialized from N[0,0],I2 where I2 is the 2×2 identity matrix. Other details are as in Figure 6.

Figure 9:

Two-dimensional histograms of approximation errors (relative to particle filter) versus KS statistic of particle distribution in a two-dimensional example (see equation 4.1). Details are as in Figure 7.

Figure 9:

Two-dimensional histograms of approximation errors (relative to particle filter) versus KS statistic of particle distribution in a two-dimensional example (see equation 4.1). Details are as in Figure 7.

Statistics of the estimation error distribution for these examples are provided in Table 1.

Table 1:
Approximation Errors Relative to PF in the Examples of Figures 6 and 8.
(a) 1D Example (Figure 6
 h=1000 h=2 
 μ σ μ σ 
Median −0.00272 1.29×10-4 -2.84×10-4 2.96×10-4 
5th percentile −0.0601 −0.0185 −0.0184 −0.0245 
95th percentile 0.0482 0.0192 0.0186 0.0178 
Mean −0.00415 1.41×10-4 3.34×10-4 -9.35×10-4 
SD 0.0345 0.0126 0.0119 0.0122 
Median absolute value 0.0188 0.00722 0.00662 0.00766 
Mean absolute value 0.0251 0.00919 0.0086 0.00942 
(b) 2D Example (Figure 8
 Position Velocity 
 μ σ μ σ 
Median -1.32×10-4 2.28×10-4 -3.37×10-4 -1.53×10-4 
5th percentile −0.0337 −0.0253 −0.0234 −0.0148 
95th percentile 0.0361 0.0257 0.0258 0.0154 
Mean -0.00101 2.95×10-5 1.51×10-5 2.95×10-5 
SD 0.0236 0.0157 0.0169 0.00922 
Median absolute value 0.0115 0.00920 0.00908 0.00564 
Mean absolute value 0.0163 0.0118 0.0121 0.00711 
(a) 1D Example (Figure 6
 h=1000 h=2 
 μ σ μ σ 
Median −0.00272 1.29×10-4 -2.84×10-4 2.96×10-4 
5th percentile −0.0601 −0.0185 −0.0184 −0.0245 
95th percentile 0.0482 0.0192 0.0186 0.0178 
Mean −0.00415 1.41×10-4 3.34×10-4 -9.35×10-4 
SD 0.0345 0.0126 0.0119 0.0122 
Median absolute value 0.0188 0.00722 0.00662 0.00766 
Mean absolute value 0.0251 0.00919 0.0086 0.00942 
(b) 2D Example (Figure 8
 Position Velocity 
 μ σ μ σ 
Median -1.32×10-4 2.28×10-4 -3.37×10-4 -1.53×10-4 
5th percentile −0.0337 −0.0253 −0.0234 −0.0148 
95th percentile 0.0361 0.0257 0.0258 0.0154 
Mean -0.00101 2.95×10-5 1.51×10-5 2.95×10-5 
SD 0.0236 0.0157 0.0169 0.00922 
Median absolute value 0.0115 0.00920 0.00908 0.00564 
Mean absolute value 0.0163 0.0118 0.0121 0.00711 

5  Encoding

We demonstrate the use of the assumed density filter in determining optimal encoding strategies: selecting the optimal population parameters ϕ (see section 2). To illustrate the use of ADF for the encoding problem, we consider two simple examples. We also use the first example as a test for the filter's robustness. We will study optimal encoding issues in more detail in a subsequent paper.

5.1  Optimal Encoding Depends on Prior Variance

Previous work using a finite neuron population and a Fisher information-based criterion (Harper & McAlpine, 2004) has suggested that the optimal distribution of preferred stimuli depends on the prior variance. When it is small relative to the tuning width, optimal encoding is achieved by placing all preferred stimuli at a fixed distance from the prior mean. When the prior variance is large relative to the tuning width, optimal encoding is uniform (see Figure 2 in Harper & McAlpine, 2004). These results are consistent with biological observations reported in Brand, Behrend, Marquardt, McAlpine, & Grothe (2002) concerning the encoding of aural stimuli.

Similar results are obtained with our model, as shown in Figure 10. Whereas Harper and McAlpine (2004) implicitly assumed a static state in the computation of Fisher information, we use a time-varying scalar state. The state obeys the dynamics
dXt=aXtdt+sdWta<0
(5.1)
and is observed through a gaussian population (see equation 3.19) and filtered using the ADF approximation. In this case, optimal encoding is interpreted as the simultaneous optimization of the population center c and the population variance σpop2. The process is initialized so that it has a constant prior distribution, its variance given by s2/2a. In Figure 10 (left), the steady-state prior distribution is narrow relative to the tuning width, leading to an optimal population with a narrow population distribution far from the origin. In Figure 10 (right), the prior is wide relative to the tuning width, leading to an optimal population with variance that roughly matches the prior variance.
Figure 10:

Optimal population distribution depends on prior variance relative to tuning function width. A scalar state with dynamics dXt=aXt+sdWt (a=-0.05) is filtered with tuning parameters h=50,α=1 and preferred stimulus density N(c,σpop2). The process is initialized from its steady-state distribution, N0,s2/-2a. Both graphs show the posterior standard deviation derived from the ADF approximation relative to the prior standard deviation σp. In the left graph, s=0.1, so the prior variance is 0.1, whereas on the right, s=0.5, so the prior variance is 2.5. In both cases, the filter is initialized with the correct prior, and the posterior variance is averaged over the time interval 1,2 and across 1000 trials for each data point. Only nonnegative values of c were simulated, but note that the process is symmetric about zero, so that the full plots would also be symmetric about c=0. The areas in white in the right plot correspond to parameters where the computed posterior variance exceeded the prior variance. This is due to the poor performance of the ADF approximation for these parameter values in cases where no spikes occur and the true posterior becomes bimodal.

Figure 10:

Optimal population distribution depends on prior variance relative to tuning function width. A scalar state with dynamics dXt=aXt+sdWt (a=-0.05) is filtered with tuning parameters h=50,α=1 and preferred stimulus density N(c,σpop2). The process is initialized from its steady-state distribution, N0,s2/-2a. Both graphs show the posterior standard deviation derived from the ADF approximation relative to the prior standard deviation σp. In the left graph, s=0.1, so the prior variance is 0.1, whereas on the right, s=0.5, so the prior variance is 2.5. In both cases, the filter is initialized with the correct prior, and the posterior variance is averaged over the time interval 1,2 and across 1000 trials for each data point. Only nonnegative values of c were simulated, but note that the process is symmetric about zero, so that the full plots would also be symmetric about c=0. The areas in white in the right plot correspond to parameters where the computed posterior variance exceeded the prior variance. This is due to the poor performance of the ADF approximation for these parameter values in cases where no spikes occur and the true posterior becomes bimodal.

Note that we are optimizing only the two parameters c,σpop2 rather than each preferred stimulus as in Harper and McAlpine (2004). This mitigates the considerable additional computational cost due to simulating the decoding process rather than computing Fisher information. This direct simulation, though more computationally expensive, offers two advantages over the Fisher information-based method used in Harper and McAlpine (2004) and prevalent in computational neuroscience. First, the simple computation of Fisher information from tuning functions, commonly used in the neuroscience literature, is based on the assumption of a static state, whereas our method can be applied in a fully dynamic context, including the presence of observation-dependent feedback. Second, simulations of the decoding process allow for the minimization of arbitrary criteria, including the direct minimization of posterior variance or mean square error (MSE). As discussed in section 1, Fisher information may be a poor proxy for the MSE for finite decoding times.

We also test the robustness of the filter to inaccuracies in model parameters or observation/encoding parameters in this problem. We use inaccurate values for the model parameter a in equation 5.1 and the observation/encoding parameter h in equation 3.13 in the filter. Specifically, we multiply or divide each of the two parameters by a factor of 5/4 in the filter, while the dynamics and the observing neural population remain unaltered. The results remain qualitatively similar, as seen in Figure 11.

Figure 11:

Robustness of the encoding result of Figure 10 to decoding with an inaccurate model of the dynamics or of the observation and encoding process. As in Figure 10, color indicates the posterior standard deviation derived from the ADF approximation relative to the prior standard deviation. The values of the dynamics parameter a and the population parameter h used by the filter, respectively denoted a^,h^, are varied while the dynamics and observation processes remain the same. Each row corresponds to a value of a^ and each column to a value of h^. White crosses denote the values where the posterior standard deviation is minimized.

Figure 11:

Robustness of the encoding result of Figure 10 to decoding with an inaccurate model of the dynamics or of the observation and encoding process. As in Figure 10, color indicates the posterior standard deviation derived from the ADF approximation relative to the prior standard deviation. The values of the dynamics parameter a and the population parameter h used by the filter, respectively denoted a^,h^, are varied while the dynamics and observation processes remain the same. Each row corresponds to a value of a^ and each column to a value of h^. White crosses denote the values where the posterior standard deviation is minimized.

5.2  Adaptation of Homogeneous Population to Stimulus Statistics

Benucci, Saleem, and Carandini (2013) measured the tuning of cat primary visual cortex (V1) neurons to oriented gratings after adapting to either a uniform distribution of orientations or a distribution with one orientation being more common. The population's preferred stimuli are roughly uniformly distributed and adapt to the prior statistics through change in amplitude. The adapted population was observed to have a decreased firing rate near the common orientation so that the mean firing rate is constant across the population.

We present a simplified model where optimal encoding exhibits a constant mean firing rate across a neural population. A random state X is drawn from a prior distribution that is a mixture of two uniform distributions on intervals sharing an end point (see Figure 12a),
px=i=12ki1aixbibi-ai,k1+k2=1,b1=a2.
(5.2)
The neural population consists of gaussian neurons (see equation 3.13) with preferred locations uniformly distributed on the interval a1,b2. However, they are allowed to adapt to different tuning amplitude h1,h2 in each of the subintervals a1,b1,a2,b2, respectively. Thus, the population distribution is given by a mixture of two uniform components,
fdy=fdh,dθ,dH',dR'=δHdH'δRdR'iδhidhfidθ,fidθ1aiθbidθ,
(5.3)
where H=1. We optimize the parameters h1,h2 to minimize accumulated decoding MSE over a finite decoding interval under a constraint on the total firing rate of the population,
minh1,h2E0TX-X^t2dt,s.t.rEλX;yfdyr,
(5.4)
where X^t=EX|Nt is the MMSE estimate of X. The total firing rate r may be obtained from equations 3.13, 5.2, and 5.3, yielding
r=r1+r2,ri=j=12kjajbjdxbj-ajaibidθhiexp-x-θ22α2=2πα2j=12kjhibj-aj×Φ1aj-biα+Φ1bj-aiα-Φ1aj-aiα-Φ1bj-biα,
(5.5)
where ri is the total firing rate in the ith subpopulation, α2=R, Φ1x-xΦ=xΦx+ϕx, and ϕ,Φ are, respectively, the pdf and cdf of the standard gaussian distribution. In this continuous population approximation, the firing rate of a single neuron with preferred stimulus θ is proportional to EλX;θ. We use narrow tuning (α=0.1), so that the rate EλX;θ is nearly constant within each subpopulation, justifying the approximation
EλX;θ1bi-aiaibiEλX;θdθ=ribi-aiaiθbi.
(5.6)
To solve this optimization problem, we approximate X^T by the ADF filter output μT. We further assume that the MSE is a decreasing function of the total firing rate r, so that the solution is obtained for r=r. Although we have no proof of this claim, it appears reasonable, since the posterior variance decreases at each spike (see section 3.1.5). The stronger constraint r=r leaves a single degree of freedom, the amplitude ratio h1/h2. In Figure 12b we evaluate E[(X-μt)2dt] using Monte Carlo simulation for various values of the ratio h1/h2, as well as location of the interval end point b2. Although the optimization is constrained only by the total firing rate, the minimal MSE is obtained near the solution of
r1b1-a1=r2b2-a2,
where firing rates are equalized across the population.
Figure 12:

Optimal population in the mixture model equalizes firing rates of different neurons. (a) Prior and neural population. (b) Estimation MSE as a function of height ratio h1/h2 and interval end point b2. Parameters are k1=0.1,k2=0.9, a1=-1,b1=a2=0,T=1,r=10,α2=0.01,H=1. The dotted white line indicates the ratio where firing rates are equalized between the two subpopulations according to the approximation equation 5.6, and the orange line the ratio where the MSE is minimized. The MSE was computed by averaging across 10,000 trials for each data point.

Figure 12:

Optimal population in the mixture model equalizes firing rates of different neurons. (a) Prior and neural population. (b) Estimation MSE as a function of height ratio h1/h2 and interval end point b2. Parameters are k1=0.1,k2=0.9, a1=-1,b1=a2=0,T=1,r=10,α2=0.01,H=1. The dotted white line indicates the ratio where firing rates are equalized between the two subpopulations according to the approximation equation 5.6, and the orange line the ratio where the MSE is minimized. The MSE was computed by averaging across 10,000 trials for each data point.

6  Comparison to Previous Work

6.1  Neural Decoding

Table 2 provides a concise comparison of our setting and results to previous work on optimal neural decoding. As noted in section 1, we focus our comparison on analytically expressible continuous-time filters.

Table 2:
Summary of the Setting and Results of Several Previous Works Based on Continuous-Time Filtering of Point-Process Observations, in Comparison to Our Work.
  Neural Code Decoding 
Reference Dynamics Rates Population Exact Complexity 
Snyder (1972); Segall, Davis, and Kailath (1975Diffusion Any Finitea   
Snyder (1972)b Diffusion Any Finite – n 
Rhodes and Snyder (1977lin. G. diff. Gaussian Uniform  
Komaee (2010lin. G. diff. Any Uniform – 
Huys et al. (20071d G. process Gaussian Uniform  Nt3c 
Eden and Brown (2008lin. G. diff. Any Finite – n 
Susemihl et al. (2011, 20131d G. Mat. Gaussian Uniform – 
Bobrowski et al. (2009); Twum-Danso and Brockett (2001CTMC Any Finite  n 
This work Diffusion Gaussian Nonuniform – 1d 
  Neural Code Decoding 
Reference Dynamics Rates Population Exact Complexity 
Snyder (1972); Segall, Davis, and Kailath (1975Diffusion Any Finitea   
Snyder (1972)b Diffusion Any Finite – n 
Rhodes and Snyder (1977lin. G. diff. Gaussian Uniform  
Komaee (2010lin. G. diff. Any Uniform – 
Huys et al. (20071d G. process Gaussian Uniform  Nt3c 
Eden and Brown (2008lin. G. diff. Any Finite – n 
Susemihl et al. (2011, 20131d G. Mat. Gaussian Uniform – 
Bobrowski et al. (2009); Twum-Danso and Brockett (2001CTMC Any Finite  n 
This work Diffusion Gaussian Nonuniform – 1d 

Notes: The complexity column lists the asymptotic computational complexity of each time step in a discrete-time implementation of the filter, as a function of population size n and number of spikes Nt. A complexity of denotes an infinite-dimensional filter. Lin. G. diff.: linear gaussian diffusion; 1d G. Mat.: one-dimensional gaussian process with Matern kernel autocorrelation; CTMC: continuous-time Markov chain.

aThe setting of Snyder (1972) and Segall et al. (1975) is a single observation point process, but the result is readily extended to a finite population by summation.

b Snyder (1972) includes both an exact PDE for the posterior statistics and an approximate solution.

cThis filter is nonrecursive, and its complexity grows with the history of spikes. The exponent 3 is related to inversion of an Nt×Nt matrix, which in principle can be performed with lower complexity.

dThe complexity is constant for distributions over preferred stimuli as described in section 3.2.3 (including the uniform coding case). For heterogenous mixture populations, the complexity is linear in the number of mixture components.

Few previous analytically oriented works studied neural decoding for dynamic stimuli or without the uniform coding assumption. The filtering problem for a dynamic state with a general (nonuniform) finite population is tractable when the state space is finite; in this case, the posterior is finite-dimensional and obeys SDEs derived in Brémaud (1981). These results have been presented in a neuroscience context in Twum-Danso and Brockett (2001) and Bobrowski et al. (2009).

We are aware of a single previous work (Eden & Brown, 2008) that derived a closed-form filter for nonuniform coding of a diffusion process in continuous time. The setting of Eden and Brown (2008) is a finite population with arbitrary tuning functions, and the derivation uses an approximation similar to the one used in our letter. Our work differs from Eden and Brown (2008) most notably by the use of a parameterized “infinite” population. In this sense, Eden and Brown (2008) setting is less general, corresponding to the case of finite population described in section 3.1. On the other hand, Eden and Brown (2008) deal with a more general form for the tuning functions. Although they also approximate the posterior using a gaussian distribution, it is not equivalent to our filter. The two filters are compared in detail in appendix D.

6.2  Neural Encoding

As the work we report in this letter is concerned with efficient neural decoding as a tool for studying neural encoding in nonuniform populations, we briefly mention some work that studies neural encoding analytically. As we have noted, we will study encoding in more detail in a subsequent paper. Much previous work used Fisher information to study nonuniform coding of static stimuli. As we noted in section 1, Fisher information does not necessarily provide a good estimate for possible decoding performance, and optimizing it may yield qualitatively misleading results (see Bethge et al., 2002; Pilarski & Pokora, 2015; Yaeli & Meir, 2010). However, we note one such work, Ganguli and Simoncelli (2014), that is particularly relevant in comparison to our work, as it similarly studies large parameterized populations. The neural populations Ganguli and Simoncelli (2014) studied are characterized by two functions of the stimulus: a density function, which described the local density of neurons, and a “gain” function, which modulates each neuron's gain according to its preferred location. The density function also distorts tuning functions so that tuning width is approximately inversely proportional to neuron density, resulting in approximately uniform coding when the gain function is constant. However, the introduction of the gain function produces a violation of uniform coding. This population is optimized for Fisher information and several related measures. For each of these measures, the optimal population density is shown to be monotonic with the prior: more neurons should be assigned to more probable states. This is in contrast to the results we present in section 5 (e.g., see Figure 10, left), where the optimal population distribution is shifted relative to the prior distribution when the prior is narrow. This discrepancy may be attributed to the limitations of Fisher information in predicting actual decoding error or to the coupling of neuron density and tuning width used in Ganguli and Simoncelli (2014) to facilitate the derivation of closed-form solutions.

Some previous work attempted direct minimization of decoding MSE rather than Fisher information. Yaeli and Meir (2010) derived an explicit expression for the MSE in the static case with uniform coding and used it to characterize optimal tuning function width and its relation to coding time. More recently, Wang, Stocker, and Lee (2016) studied Lp-based loss measures and presented exact results for optimal tuning functions in the case of univariate static signals, for single neurons and homogeneous populations. Susemihl et al. (2011, 2013) suggested a mean-field approximation to allow efficient evaluation of the MSE in a dynamic setting.

Notes

1

Although we describe this work using neuroscience terminology, the motivation and formulation in Rhodes and Snyder (1977) is not related to neuroscience.

2
For an introduction to SDEs and the Wiener process see, for example, Øksendal (2003). Intuitively, equation 3.1 may be interpreted as a differential equation with continuous-time gaussian white noise ξt,
X˙t=AXt+DXtξt,
or as the limit as Δt0 of the discretized dynamics,
Xk+1Δt=XkΔt+AXkΔtΔt+DXkΔtξkΔt,
where ξk are independent standard gaussian variables.
3

In the sense that any two solutions Xt1,Xt2 defined over 0,T with X01=X02 satisfy P[Xt1=Xt2t0,T]=1. See, e.g., Øksendal (2003, theroem 5.2.1) for sufficient conditions.

4

If equation 3.1 includes an additional feedback (control) term, Ni are not DSPPs. Our results apply with little modification to this case, as described in appendix A.

5

See Segall and Kailath (1975, theorem 2).

6

The setting of Snyder (1972) includes a single observation point process. The extension to several point processes is obtained through summation as in equation 3.5 and is a special case of a more general PDE described in Rhodes and Snyder (1977) and discussed in appendix A.

7

This holds only approximately, when the tuning width is not too large relative to the size of the interval. For wider tuning functions, the behavior becomes similar to the single sensor case.

8

A more detailed discussion of this definition and of innovation processes is available in Segall and Kailath (1975).

9

That is, it is strictly a function of the history Ft.

10

The setting considered in Rhodes and Snyder (1977) assumes linear dynamics and an infinite uniform population. However, these assumption are only relevant to establish other proposition in that paper. The proof of equation 3.5 still holds as is in our more general setting.

Appendix A:  Derivation of Filtering Equations

A.1  Setting and Notation

In the main text, we have presented our model in an open-loop setting, where the process X is passively observed. Here we consider a more general setting, incorporating a control process Ut, so the dynamics are
dXt=AXt+BUtdt+DXtdWt,
(A.1)
where, in general, Ut is a function of Nt.

We denote by ptN· the posterior density—that is, the density of Xt given Nt, and by EtN· the posterior expectation—that is, expectation conditioned on Nt.

A.2  The Innovation Measure

We derive the filtering equations in terms of the innovation measure of the marked point process, which is a random measure closely related to the notion of the innovation process associated with unmarked point processes or diffusion processes. The role of the innovation measure in filtering marked point processes is analogous to the role of the innovation process in Kalman filtering.

In section 3.1.1, we characterized the intensity (or rate) of each point process in a finite population using the asymptotic behavior of jump probabilities in small intervals. An alternative definition is the following.8 Consider an unmarked point-process Nt with Nt denoting its history up to time t. Given some history Ft containing Nt (e.g., Ft might include the observation process N and its driving state process X), the process λtF is called the intensity of N relative to F when λt is Ft-measurable9 and
ItFNt-0tλsFds
is an Ft-martingale, meaning
EItF|Fs=IsF,
or, equivalently,
ENt|Fs=Ns+EstλF|Fs.
Heuristically we may write this relation as
EdNt|Ft=λtFdt.
When Ft=Nt, the process ItItN is called the innovation process. We may write
dNt=λtNdt+dIt=EdNt|Nt+dIt,
so the innovation increments dIt represent the unexpected part of the increments dNt after observation of Nt. The innovation process may be similarly defined for other stochastic processes (such as discrete time or diffusion processes), and it plays an important role in the Kalman and the Kalman-Bucy filters. It plays an analogous role in the filtering of point processes, as seen below.
As discussed in section 3.2.2, in the continuous population model, we characterize the observation process N by its intensity kernel relative to the history Nt,X0,t:
ENdt,dy|X0,t,Nt=λXt;yfdydt.
This equation is a heuristic notation for the statement that the rate of NtY relative to Nt,X0,t is YλXt;yfdy, for any measurable YY. The rate relative to the spiking history Nt is obtained by marginalizing over Xt (see Segall & Kailath, 1975, theorem 2), yielding the rate Yλ^yfdy, where
λ^y=EtNλXt;y.
Therefore, the innovation process of NtY is ItY=NtY-0tYλ^syfdyds, and accordingly we define the innovation measure I to be the random measure
Idt,dyNdt,dy-λ^tyfdydt,
(A.2)
so that the innovation process may be expressed as
ItY=0tYIds,dy.
The martingale property of ItY implies that the innovation measure satisfies
EtNYIdt,dy=0
for all measurable YY.

A.3  Exact Filtering Equations

Define
ωtx;yλx;yλ^ty-1,ωtyωtXt;y.
The stochastic PDE, equation 3.5, is extended in Rhodes and Snyder (1977) for the case of marked point-process observation in the presence of feedback.10 In this case, the posterior density ptN obeys the equation
dptNx=Lt*ptNxdt+pt-NxyYωtx;yIdt,dy.
(A.3)
Here Lt is the posterior infinitesimal generator, defined with an additional conditioning on Nt,
Lthx=limΔt0+EhXt+Δt|Xt=x,Nt-hxΔt,
and Lt* is its adjoint. Note that in this closed-loop setting, the infinitesimal generator is itself a random operator due to its dependence on past observations through the control law and that Nt is no longer a doubly stochastic Poisson process.
As in section 3.1.3, we use the notations
μtEtNXt,X~tXt-μt,ΣtEtNX~tX~t.
We derive the following equations for the first two posterior moments, which generalize equation 3.6 to marked point processes, and for the presence of feedback in the state dynamics:
dμt=EtNAXt+BUtdt+YEt-Nωt-yXt-Idt,dy,
(A.4a)
dΣt=EtNAXtX~tT+X~tAXtT+DXtDXtTdt+YEt-Nωt-yX~t-X~t-TIdt,dy-YEt-Nωt-yXt-Et-Nωt-yXt-TNdt,dy.
(A.4b)
A rigorous derivation of equation A.4a under more general conditions is found in Segall et al. (1975), from which equation A.4b may be derived by considering the dynamics of the process XtXt. Here we provide a more heuristic derivation based on equation A.3.
Compare the mean update equation, A.4a, to the Kalman-Bucy filter, which gives the posterior moments for a diffusion process X with noisy observations Y of the form
dXt=AtdXt+DtdWt,dYt=HtXtdt+dVt,
where W,V are independent standard Wiener processes. The Kalman-Bucy filter reads
dμt=Atμtdt+ΣtHtdYt-Htμtdt,Σ˙t=AtΣt+ΣtAt+DtDt-ΣtHtHtΣt.
Here, the term dYt-Htμtdt appearing in the first equation is an increment of the innovation process Yt-0tHsμsds.
For a sufficiently well-behaved function h, we find, using equation A.3 and the definition of operator adjoint,
dEtNhXt=dxhxLt*ptNxdt+pt-NxyYωt-x;yIdt,dy=dxptNxLthxdt+pt-NxhxyYωt-x;yIdt,dy=EtNLthXtdt+yYEt-NhXt-ωt-yIdt,dy.
(A.5)
Assuming the state evolves as in equation A.1, the (closed-loop) infinitesimal generator is
Lthx=Ax+BUthx+12Tr2hxDxDx,
which, when specialized to the functions hix=xi and hijx=xixj, where xi is the ith component of x, reads
Lthix=Ax+BUti,
(A.6)
Lhijx=Ax+BUtixj+xiAx+BUtj+12DxDxij+12DxDxji=Ax+BUtixj+xiAx+BUtj+DxDxij.
(A.7)
Substituting equation A.6 into A.5 yields
dμti=EtNAx+BUtidt+yYEt-NXt-iωt-yIdt,dy,
or, in vector notation,
dμt=EtNAXtdt+BUtdt+yYEt-Nωt-yXt-Idt,dy.
(A.8)
To compute dΣt we use the representation dΣt=dEtNXtXt-dμtμt. The first term is computed by substituting equation A.7 into A.5, yielding
dEtNXtiXtj=EtNAXt+BUtiXtj+XtiAXt+BUtj+EtNDXtDXtij+yYEt-NXt-iXt-jωt-yIdt,dy,
or, in matrix notation, after some rearranging,
dEtNXtXt=EtNAXtXt+XtAXtdt+BUtμt+μtBUtdt+EtNDXtDXtdt+yYEt-Nωt-yXt-Xt-Idt,dy.
(A.9)
To calculate dμtμt from equation A.8, we separately handle the continuous terms and the jump term involving Ndt,dy. The continuous terms are the continuous part of dμtμt+μtdμt. To compute the jump terms, we note that when μt jumps by Δt, the corresponding jump in μtμt is Δtμt+μtΔt+ΔtΔt; therefore,
dμtμt=EtNAXtμt+μtEtNAXtdt+BUtμt+μtBUtdt-yYEtNωtyXtμt+μtEtNωtyXtλ^tyfdydt
(A.10)
+yY(Et-Nωt-yXt-μt+μtEtNωt-yXt+Et-Nωt-yXt-Et-Nωt-yXt-)Ndt,dy.
(A.11)
Subtracting equation A.11 from A.9, and noting that EtNωty=0, so
Et-Nωt-yX~t-X~t-=Et-Nωt-yXt-Xt--Et-Nωt-yXt-μt--μt-Et-Nωt-yXt-
yields equation A.4b.
Writing equations A.4a and A.4b according to the decomposition described in section 3.1.3,
dμtπ=EtNAXt+BUtdt,dΣtπ=EtNAXtX~t+X~tAXt+DXtDXtdt,dμtc=-yYEtNωtyXtλ^yfdydt,
(A.12)
dΣtc=-yYEtNωtyX~tX~tλ^yfdydt,
(A.13)
dμtN=yYEt-Nωt-yXt-Ndt,dy,
(A.14)
dΣtN=yYEt-Nωt-yX~t-X~t--Et-Nωt-yXt-Et-Nωt-yXt-Ndt,dy.
(A.15)

A.4  ADF Approximation for Gaussian Tuning

We now proceed to apply the gaussian ADF approximation ptNxNx;μt,Σt to equations A.12 to A.15 in the case of gaussian neurons (see equation 3.3), deriving approximate filtering equations written in terms of the population density fdy. From here on, we use μt,Σt, and ptN to refer to the ADF approximation rather than to the exact values.

We use the following algebraic results. The first is a slightly generalized form of a well-known result about the sum of quadratic forms, which is useful for multiplying gaussians with possibly degenerate precision matrices:

Claim 1. Let x,a,bRn and A,BRn×n be symmetric matrices such that A+B is nonsingular. Then
x-aA2+x-bB2=a-bAA+B-1B2+x-A+B-1Aa+BbA+B2.
Proof.

The proof is accomplished by straightforward expansion of each side.

Next, we note two matrix inversion lemmas, the first of which is known as the Woodbury identity. These are useful for transferring variance matrices between state and perceptual coordinates in our model. Derivations may be found in Henderson and Searle (1981).

Claim 2. For U,VRm×n and nonsingular ARm×m,CRn×n, the following equalities hold,
A+UCV-1=A-1-A-1UC-1+VA-1U-1VA-1,A-1UC-1+VA-1U-1=A+UCV-1UC,
(A.16)
whenever all the relevant inverses exist.
To evaluate the posterior of expectations in equations A.12 to A.15, we first simplify the expression
ptNxωtx;y=ptNxλx;yptNξλξ;ydξ-ptNx.
(A.17)
Using the gaussian ADF approximation ptNx=Nx;μt,Σt, equation 3.13, and claim 1, we find
ptNxλx;h,θ,H,R==hNx;μt,Σtexp-12Hx-θR2=hexp-12x-μtΣt-12-12Hx-θR22πnΣt=h2πnΣtexp-12Hr-1θ-μtQt2-12x-μtθΣt-1+HTRH2,
where Hr-1 is any right inverse of H, and
QtΣt-1Σt-1+HRH-1HRH,
(A.18)
μtθΣt-1+HRH-1Σt-1μt+HRθ.
(A.19)
To simplify the notation, we suppress the dependence of these and other quantities on H and R throughout this section. Claim 2 establishes the relation Qt=HStH, where
StR-1+HΣtH-1,
yielding
ptNxλx;h,θ,H,R=h2π-n/2Σt-1×exp-12θ-HμtSt2-12x-μtθΣt-1+HRH2,
(A.20)
and by normalizing this gaussian (see equation A.17), we find that
ptNxωtx;h,θ,R=Nx;μtθ,Σt-1+HRH-1-ptNx,
yielding
EtNωtyXt=μtθ-μt,EtNωtyX~tX~tT=Σt-1+HRH-1+μt-μtθμt-μtθ-Σt.
(A.21)
Using claim 2, the difference μtθ-μt may be rewritten as
μtθ-μt=Σt-1+HRH-1Σt-1μt+HRθ-μt=-Σt-1+HRH-1HRHμt+Σt-1+HRH-1HRθ=-Σt-1+HRH-1HRδt=-ΣtHStδt,
where δt=Hμt-θ, and an application of the Woodbury identity, equation A.16, yields
Σt-1+HRH-1-Σt=ΣtI+HRHΣt-1-I=ΣtI-HR-1+HΣtH-1HΣt-I=-ΣtHStHΣt.
Now,
EtNωtyXt=-ΣtHTStδt,EtNωtyX~tX~tT=ΣtHTStδtδtSt-StHΣt.
Plugging this result into equations A.12 to A.15 yields equations 3.16a to 3.16d. Integrating equation A.20 over x yields
λ^h,θ,H,R=hI+ΣtHRHexp-12θ-HμtSt2.
Sylvester's determinant identity yields the equality I+ΣtHTRH=R/StR, from which equation 3.16g follows.
The continuous precision update, equation 3.16h, follows directly from equation 3.16b and the relation
dΣt-1dt=-Σt-1dΣtdtΣt-1,
which holds whenever Σt is differentiable—that is, between spikes. To derive equation 3.16i, consider a spike at time t with mark y=h,θ,H,R. According to equation 3.16d,
Σt-1=Σt--Σt-HSt-HΣt--1=I-HSt-HΣt--1Σt--1=I+HSt--1-HΣt-H-1HΣt-Σt--1(Woodburyidentity)=Σt--1+HRHfromequation3.16f
Finally, equations A.14 and A.21 yield dμtN=μt-θ-μt-Ndt,dy, so at spike times,
μt+=μt-θ=Σt--1+HRH-1Σt--1μt-+HRθ.
The finite-population case is equation 3.12.

A.5  Approximation of Continuous Terms for Specific Population Distributions

A.5.1  Gaussian Population

When the preferred stimuli are normally distributed, equation 3.19, the continuous update terms, equations 3.16a and 3.16b may be computed analogously to the derivation in section A.4. First, starting from equation 3.16g, an analogous computation to the derivation of equations A.20 and 3.16g above yields
λ^th,θ,H,Rfdθ=λ^tf·Nθ;μtf,Σpop-1+St-1dθ,
where
μtfΣpop-1+St-1Stμt+Σpop-1c.
Integrating this equation over θ and applying Sylvester's determinant lemma as in the derivation of equation 3.16g yields equation 3.20d. The matrix ZtH,R (see equation 3.20c) and vector μtf play an analogous role to that of St and μtθ, respectively, in section A.4. Substituting into equations 3.16a and 3.16b and simplifying analogously yields equation 3.20.

A.5.2  Uniform Population on an Interval

In this case R,h are constant, m=n=1 and H=1, and the preferred stimulus distribution is fdθ=1a,bθdθ, so equations 3.16a and 3.16b take the form
dμtc=σt2σt2+α2abμt-θλ^tθdθdt,
(A.22)
dσt2,c=λ^tf-abθ-μt2λ^tθdθσt2+α2σt2σt2+α2σt2dt,λ^tθ=h2πα2Nθ;μt,σt2+α2,
(A.23)
where σt2=Σt,σr2=R-1, and we suppressed the dependence of λ^ on h,R from the notation, since h,R are fixed. The integrals can be computed from the following identities,
abNx;μ,σ2x-μdx=-zσ,abNx;μ,σ2x-μ2dx=Z-z'σ2,
where
a'a-μσ,Za'b'ϕ=Φb'-Φa'b'b-μσ,zϕb'-ϕa',ϕxNx;0,1,z'b'ϕb'-a'ϕa'.
Writing
at'a-μtσt2+α2,ZtΦbt'-Φat'bt'b-μtσt2+α2,ztϕbt'-ϕat',zt'bt'ϕbt'-at'ϕat'.
we find that
abμt-θλ^tθdθ=h2πα2abμt-θNθ;μt,σt2+α2dθ=h2πα2ztσt2+α2,abθ-μt2λ^tθdθ=h2πα2abθ-μt2Nθ;μt,σt2+α2dθ=h2πα2Zt-zt'σt2+α2,λ^tf=abλ^tθdθ=h2πα2Zt.

Substitution into equations A.22 and A.23 yields equation 3.23.

Appendix B:  Implementation Details

B.1  State Dynamics

All simulations in this letter use linear dynamics of the form
dXt=AXtdt+DdWt,
(B.1)
which are implemented via a straightforward Euler scheme. Specifically, for step size Δt, we approximate XkΔt by xk, where
x0=X0,xk+1=xk+AxkΔt+DξkΔt,
and ξk is a sequence of independent standard normal variables (independent of X0).

B.2  Continuous Neural Population

The simulation of marked point processes, used to model continuous neural populations (see section 3.2.2), involves the generation of random times and corresponding random marks. In the case of a finite population, there is a finite number of marks, and the point process corresponding to each mark may be simulated separately. For an infinite population, a different approach is required.

Given the intensity kernel λXt;yfdy, we simulate a marked point process in two stages: first generating the random times (spike times) and then the random marks (neuron parameters). To generate the random times t1,tNT, note that the total history-conditioned firing rate at time t is given by
rXtYλXt;yfdy,
and the unmarked process Nt is a doubly stochastic Poisson process with random rate rXt. Conditioned on X and the point-process history, each random mark yi is distributed:
yi|X,t1,ti,y1yi-1κXti,dyλXti;yfdyrXti
(see Brémaud, 1981, theorem T6).

Accordingly, the simulation of the marked process N proceeds as follows:

  1. Using the generated trajectory xkXkΔt, simulate a Poisson process with rate rXt, yielding the random times (t1,tNT). This may be accomplished either via direct generation of a point for each time step with probability rxkΔt or more efficiently via time rescaling (see, e.g., Brown, Barbieri, Ventura, Kass, & Frank, 2002).

  2. Generate random marks (y1,yNT) by sampling independently from the distribution κXti,dy.

As in section 3.2.3, when h,H,R are fixed across the population, we abuse notation and write fdh',dθ,dH',dR'=δhdh'δHdH'δRdR'fdθ, and similarly for κx;dθ. The functions rx and the distribution κ(x,dθ) for each of the distributions of preferred stimuli in section 3.2.3 are given in closed form in Table 3. For a finite heterogeneous mixture population, r and κ may be obtained through the appropriate weighted summation; however, it is easier to simulate each component separately.

Table 3:
Total Population Rates rx and Mark Sampling Distributions κx;dθ for the Preferred Stimulus Distributions fdθ of Section 3.2.3 with Gaussian Tuning λx;θ=hexp-12Hx-θR2.
fdθ rx κx;dθ 
δθ0dθ λx;θ0 δθ0dθ 
dθ h2πmdetR N(θ;Hx,R-1)dθ 
N(θ;c,G)dθ h2πmdetRNc;Hx,R-1+G N(θ;GRGHx+R-1RGc,(R+G-1)-1)dθ, 
  where RG=R-1+G-1 
1aθbdθ h2πRΦ(zbx)-Φ(zax)Na,bθ;Hx,R-1dθ 
 where zsx=Rs-Hx (truncated normal distribution) 
fdθ rx κx;dθ 
δθ0dθ λx;θ0 δθ0dθ 
dθ h2πmdetR N(θ;Hx,R-1)dθ 
N(θ;c,G)dθ h2πmdetRNc;Hx,R-1+G N(θ;GRGHx+R-1RGc,(R+G-1)-1)dθ, 
  where RG=R-1+G-1 
1aθbdθ h2πRΦ(zbx)-Φ(zax)Na,bθ;Hx,R-1dθ 
 where zsx=Rs-Hx (truncated normal distribution) 

Note: The derivations of these closed forms are straightforward for the Dirac and uniform distributions; the derivation for gaussian distribution is by multiplication of gaussians, completely paralleling the computations in section A.4.

B.3  Filter

Similar to the state dynamics, we approximate the filter equations
dμt=dμtπ+dμtc+dμtN,dΣt=dΣtπ+dΣtc+dΣtN,
using a Euler approximation,
μk+1Δt=μkΔt+dμtπdt+dμtcdtΔt+dμtNt=kΔt,Σk+1Δt=ΣkΔt+dΣtπdt+dΣtcdtΔt+dΣtNt=kΔt.

For the linear dynamics (see equation B.1) used in simulations throughout this letter, the prior terms dμtπ,dΣtπ are given by equation 3.8. The continuous update terms dμtc,dΣtc depend on the population as described in section 3.2.3. The discontinuous updates dμtN,dΣtN are given by equations 3.16c and 3.16d, and are nonzero only at time steps containing a spike.

Appendix C:  Variance as Proxy for MSE

In section 5, we studied optimal encoding using the posterior variance as a proxy for the MSE. Letting μt,Σt denote the approximate posterior moments given by the filter, the MSE and posterior variance are related as follows,
MSEtEtrXt-μtXt-μtT=EEtNtrXt-μtXt-μtT=EtrVartNXt+Etrμt-EtNXtμt-EtNXtT,
where EtN·,VartN· are, respectively, the mean and covariance matrix conditioned on Nt, and tr is the trace operator. Thus, for an exact filter, having μt=EtNXt,Σt=VartNXt, we would have MSEt=E[trΣt]. Conversely, if we find that MSEtE[trΣt], it suggests that the errors are small (though this is not guaranteed, since the errors in μt and Σt may affect the MSE in opposite directions, if the variance is underestimated).

Figure 13 shows the variance and MSE in estimating the same process as in Figure 10 after averaging across 10,000 trials. The results indicate that the filter has good accuracy with these parameters, so that the variance is a reasonable approximation for the MSE.

Figure 13:

Posterior variance versus MSE when filtering the one-dimensional process dXt=aXtdt+sdWt of Figure 10. As in Figure 10, parameters on the left are a=-0.05,s=0.1 and on the right a=-0.05,s=0.5. The sensory population is gaussian (see equations 3.19 and 3.13) with population parameters c=0,σpop2=4 with and tuning parameters α=1,h=50. The top plots show the normalized measured bias μt-Xt/σt, where · denotes averaging across trials. The center plots show the MSE μt-Xt2 and mean posterior variance σt2. The bottom plot shows the ratio of means μt-Xt2/σt2 and the mean ratio μt-Xt2/σt2. The means were taken across 10,000 trials. Shaded areas indicate 95% confidence intervals obtained via bootstrapping.

Figure 13:

Posterior variance versus MSE when filtering the one-dimensional process dXt=aXtdt+sdWt of Figure 10. As in Figure 10, parameters on the left are a=-0.05,s=0.1 and on the right a=-0.05,s=0.5. The sensory population is gaussian (see equations 3.19 and 3.13) with population parameters c=0,σpop2=4 with and tuning parameters α=1,h=50. The top plots show the normalized measured bias μt-Xt/σt, where · denotes averaging across trials. The center plots show the MSE μt-Xt2 and mean posterior variance σt2. The bottom plot shows the ratio of means μt-Xt2/σt2 and the mean ratio μt-Xt2/σt2. The means were taken across 10,000 trials. Shaded areas indicate 95% confidence intervals obtained via bootstrapping.

Figure 14 is a variant of Figure 10 showing the MSE rather than the variance. The results are noisier but qualitatively similar. The largest differences are observed in Figure 13b for small population variance, where the ADF estimation is poor due to very few spikes occurring.

Figure 14:

Optimal population distribution depends on prior variance relative to tuning curve width. This figure is based on the same data as Figure 10 in the main text, with MSE plotted instead of estimated posterior variance. See Figure 10 for more details.

Figure 14:

Optimal population distribution depends on prior variance relative to tuning curve width. This figure is based on the same data as Figure 10 in the main text, with MSE plotted instead of estimated posterior variance. See Figure 10 for more details.

Appendix D:  Comparison to Previous Work: Additional Details

As noted in section 6.1, we are aware of a single previous work (Eden & Brown, 2008) deriving a closed-form filter for nonuniform coding of a diffusion process in continuous time. We now present a detailed comparison of our work to Eden and Brown (2008).

The filter derived in Eden and Brown (2008) is a continuous-time version of a discrete-time filter presented in Eden, Frank, Barbieri, Solo, and Brown (2004). The setting of Eden et al. (2004) involves linear discrete-time state dynamics and a finite population of neurons with arbitrary tuning functions, firing independently (given the state). The derivation of the filter relies on an approximation applied to the measurement update for the posterior density,
logpXk|Nk=logpXk|Nk-1+logPΔNk|Xk+const,
where Xk is the external state at time tk, Nk is the spiking history up to time tk, and ΔNk the spike counts in the interval (tk-1,tk]. A gaussian density is substituted for each of the terms pXk+1,Nk+1,pXk+1,Nk, and each side is expanded to a second-order Taylor series about the point EXk|Nk-1. This yields equations relating the first two moments of pXk|Nk to those of pXk|Nk-1. The time update equations for the first two moments (relating pXk+1|Nk to pXk|Nk) require no approximation since the dynamics are linear. The resulting discrete-time filtering equations (equations 2.7 to 2.10 in Eden et al., 2004) depend on the gradient and Hessian of the logarithm of the tuning functions at the point EXk|Nk-1. The continuous-time version (equations 6.3 and 6.4 in Eden & Brown, 2008) is derived by taking the limit as the time discretization step Δt approaches 0.

In contrast, the starting point in our derivation is the exact update equations for the first two moments expressed in terms of posterior expectations. A gaussian posterior is substituted into these expectations, resulting in tractable integrals in the case of gaussian tuning functions. The tractability of these integrals depends on the gaussian form of the tuning functions.

To compare the resulting filters, we consider a finite population of gaussian neurons: this is the intersection of the setting of the current work with that of Eden and Brown (2008). In this case, the filtering equations in Eden and Brown (2008) yield the same discontinuous update terms, but the continuous update terms take the form
dμtc,EB=iλiμtΣtHiRiδtidt,
(D.1a)
dΣtc,EB=iλiμtΣtHiRi-RiδtiδtiRiHiΣtdt,
(D.1b)
where
δtiHiμt-θi
(see section D.1). These equations differ from equations 3.9a and 3.9b in the use of the tuning function shape matrix Ri in place of Sti (defined in equation 3.9f), and λiμt in place of λ^ti. The difference between λiμt and λ^ti similarly involves substituting Ri for Sti,
λiμt=hexp-12δtiRiδti,λ^ti=hStiRiexp-12δtiStiδti.
Since Sti=Ri-1+HiΣtHi-1, our filtering equations take into account the posterior variance Σt in several places where it is absent in equations D.1. Note that when Σt=0, we have Ri=Sti and λμt=λ^t, so the equations become increasingly similar when Σt0. We refer to the filter, equations D.1, as the Eden-Brown (EB) filter.

We compared the performance of our filter and equations D.1 in simulations of a simple one-dimensional setup. Figure 15 shows an example of filtering a static one-dimensional state observed through two gaussian neurons (see equation 3.3) using both the ADF approximation, equations 3.9a and 3.9b, and the EB filter, equation D.1, for comparison. Since the mean square error is highly sensitive to outliers where the approximation fails and the filtering error becomes large, we compare the filters by observing the distribution of absolute errors (AE) in estimating the state. Figure 16 compares the AE in estimating the state for the two filters. As expected from our analysis, the ADF filter has an advantage, particularly in earlier times when the posterior variance is large. This is seen most clearly in the 95th percentile of AEs in Figure 16a and in the tail histograms in Figure 16c, but a small advantage may also be observed for the median in Figure 16d. However, in some trials, the error in the EB filter remains large throughout.

Figure 15:

An example of a static 1D state observed through two sensory neurons and filtered by ADF, equations 3.9a and 3.9b, and by the EB filter, equations D.1a and D.1b. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus θ. The approximate posterior means obtained from ADF and the EB filter are shown in orange and blue, respectively, with the corresponding posterior standard deviations in shaded areas of the respective colors. The curves to the right of the graph show the tuning functions of the two neurons. Parameters are h=10,H=1,R-1=0.5,μ0=0,σ02=1, and the neurons are centered at -1 and 1. The dynamics were discretized with time step Δt=10-3.

Figure 15:

An example of a static 1D state observed through two sensory neurons and filtered by ADF, equations 3.9a and 3.9b, and by the EB filter, equations D.1a and D.1b. Each dot corresponds to a spike with the vertical location indicating the neuron's preferred stimulus θ. The approximate posterior means obtained from ADF and the EB filter are shown in orange and blue, respectively, with the corresponding posterior standard deviations in shaded areas of the respective colors. The curves to the right of the graph show the tuning functions of the two neurons. Parameters are h=10,H=1,R-1=0.5,μ0=0,σ02=1, and the neurons are centered at -1 and 1. The dynamics were discretized with time step Δt=10-3.

Figure 16:

Distribution of absolute estimation errors as a function of time, collected from 10,000 trials. Parameters are the same as in Figure 15, with the state sampled from N0,1 in each trial. Preferred stimuli are noted above each subfigure. (a) Medians of the distribution of AEs for ADF (orange) and the EB filter (blue), along with 5th and 95th percentiles, as a function of time. The medians are indistinguishable at this scale. (b) Histograms of AEs for some specific times, with logarithmically spaced bins. The vertical dotted line indicates the AE at filter initialization, which equals the prior expectation EX0-μ0=σ0Φ-134 where Φ-1 is the quantile function for the standard normal distribution. (c) Histogram of AEs larger than the initial AE. (d) Normalized differences of AE percentiles pADFr-pEBr/12pADFr+pEBr, where pADFr,pEBr are the rth quantile of the AE distribution for the ADF and EB filters, respectively, for r=0.5,0.05,0.95. Negative values indicate an advantage of ADF over EB. Shaded areas indicate 95% confidence intervals derived via bootstrapping (e.g., Efron & Tibshirani, 1994).

Figure 16:

Distribution of absolute estimation errors as a function of time, collected from 10,000 trials. Parameters are the same as in Figure 15, with the state sampled from N0,1 in each trial. Preferred stimuli are noted above each subfigure. (a) Medians of the distribution of AEs for ADF (orange) and the EB filter (blue), along with 5th and 95th percentiles, as a function of time. The medians are indistinguishable at this scale. (b) Histograms of AEs for some specific times, with logarithmically spaced bins. The vertical dotted line indicates the AE at filter initialization, which equals the prior expectation EX0-μ0=σ0Φ-134 where Φ-1 is the quantile function for the standard normal distribution. (c) Histogram of AEs larger than the initial AE. (d) Normalized differences of AE percentiles pADFr-pEBr/12pADFr+pEBr, where pADFr,pEBr are the rth quantile of the AE distribution for the ADF and EB filters, respectively, for r=0.5,0.05,0.95. Negative values indicate an advantage of ADF over EB. Shaded areas indicate 95% confidence intervals derived via bootstrapping (e.g., Efron & Tibshirani, 1994).

In Figure 16 (left) we chose the preferred stimuli -0.51 and 0.5. These are not symmetric around 0 due to a limitation of the EB filter. When applied to a homogeneous population and the current posterior mean is precisely in the average of the population's preferred stimuli, the posterior mean remains constant until the next spike, while the posterior variance evolves as
dσt2,c,EB=iλiμt1α21-μt-θi2α2σt4dt=const·σt4dt,
which diverges in finite time if the constant coefficient is positive. The coefficient is positive when θi are close to μt. This causes divergence of the filter when the preferred stimuli are symmetric around the initial estimate μ0 and sufficiently near it and the first spike is sufficiently delayed. To avoid this behavior we chose preferred stimuli that are not symmetric around μ0=0. This asymmetry causes an eventual shift in the posterior mean in the absence of spikes, which suppresses the growth of the posterior variance. This effect may cause high estimation errors before the first spike, as evident in Figure 16d (left).

D.1  Derivation of Equations D.1

In our notation, the filtering equation of Eden and Brown (2008) read
dμtc,EB=-iΣtlogλiμtλiμtdt,
(D.2a)
dΣtc,EB=-iΣt2λiμtΣtdt,
(D.2b)
dμtN,EB=iΣt+logλiμt-dNti,
(D.2c)
dΣtN,EB=-iΣt-St-EBΣt-dNti,
(D.2d)
where ,2 denote the gradient and Hessian, respectively, and St-EB is defined as
StEB2logλiμtΣt2logλiμt-I-1

Note that this is a corrected version of the definition used in Eden and Brown (2008), which is unusable when the Hessian is singular (this occurs in our model whenever m<n). The definition of StEB given here extends it to the singular case.

For a gaussian firing rate, equation 3.3, the relevant gradient and Hessians are given by
logλix=-HiRiHix-θi,2logλix=-HiRiHi,2λix=-λixHiRi-RiHix-θiHix-θiRiHi,
so
StEB=2logλiμtΣt2logλiμt-I-1=-HiRiHi-ΣtHiRiHi-I-1=HiRiHiΣt-1+HiRiHi-1Σt-1=HiRi-1+HiΣtHi-1Hi(usingclaim2)=HiStiHi,
where Sti is given by equation 3.9f). Substituting into equations D.2 yields the continuous updates, equations D.1a and D.1b, and the discontinuous updates
dμtN,EB=iΣt+HiRiδt-idNti,dΣtN,EB=-iΣt-HiSt-iHiΣt-dNti.
The discontinuous variance update is identical to the ADF update, equation 3.9c. The discontinuous mean update can be rewritten in terms of Σt- by noting that
Σt+HiRi=Σt--Σt-HiSt-iHiΣt-HiRi=Σt-HiI-St-iHiΣt-HiRi=Σt-HiI-Ri-1+HtΣtHi-1HiΣt-HiRi=Σt-HiI+RiHiΣt-Hi-1Ri(Woodburyidentity)=Σt-HiSti,
so the mean update may be rewritten as
dμtN,EB=iΣt-HiStiδt-idNti,
in agreement with equation 3.9c.

Acknowledgments

The work of Y.H. and R.M. is partially supported by grant 451/17 from the Israel Science Foundation and by the Ollendorff Center of the Viterbi Faculty of Electrical Engineering at the Technion.

References

Benucci
,
A.
,
Saleem
,
A. B.
, &
Carandini
,
M.
(
2013
).
Adaptation maintains population homeostasis in primary visual cortex
.
Nature Neuroscience
,
16
(
6
),
724
729
. doi:10.1038/nn.3382
Bethge
,
M.
,
Rotermund
,
D.
, &
Pawelzik
,
K.
(
2002
).
Optimal short-term population coding: When Fisher information fails
.
Neural Comput.
,
14
(
10
),
2317
2351
. doi:10.1162/08997660260293247
Bobrowski
,
O.
,
Meir
,
R.
, &
Eldar
,
Y.
(
2009
).
Bayesian filtering in spiking neural networks: Noise, adaptation, and multisensory integration
.
Neural Comput.
,
21
(
5
),
1277
1320
. doi:10.1162/neco.2008.01-08-692
Brand
,
A.
,
Behrend
,
O.
,
Marquardt
,
T.
,
McAlpine
,
D.
, &
Grothe
,
B.
(
2002
).
Precise inhibition is essential for microsecond interaural time difference coding
.
Nature
,
417
(
6888
),
543
547
. doi:10.1038/417543a
Brémaud
,
P.
(
1981
).
Point processes and queues: Martingale dynamics
.
New York
:
Springer
.
Brigo
,
D.
,
Hanzon
,
B.
, &
Le Gland
,
F.
(
1999
).
Approximate nonlinear filtering by projection on exponential manifolds of densities
.
Bernoulli
,
5
(
3
),
495
534
.
Brown
,
E. N.
,
Barbieri
,
R.
,
Ventura
,
V.
,
Kass
,
R. E.
, &
Frank
,
L. M.
(
2002
).
The time-rescaling theorem and its application to neural spike train data analysis
.
Neural Computation
,
14
(
2
),
325
346
.
Chelaru
,
M.
, &
Dragoi
,
V.
(
2008
).
Efficient coding in heterogeneous neuronal populations
.
Proceedings of the National Academy of Sciences
,
105
(
42
),
16344
16349
.
Dayan
,
P.
, &
Abbott
,
L.
(
2005
).
Theoretical neuroscience: Computational and mathematical modeling of neural systems
.
Cambridge, MA
:
MIT Press
.
Doucet
,
A.
, &
Johansen
,
A.
(
2009
). A tutorial on particle filtering and smoothing: Fifteen years later. In
D.
Crisan
&
B.
Rozovskii
(Eds.),
Handbook of nonlinear filtering
(pp.
656
704
).
Oxford
:
Oxford University Press
.
Ecker
,
A.
,
Berens
,
P.
,
Tolias
,
A.
, &
Bethge
,
M.
(
2011
).
The effect of noise correlations in populations of diversely tuned neurons
.
Journal of Neuroscience
,
31
(
40
),
14272
14283
.
Eden
,
U. T.
, &
Brown
,
E. N.
(
2008
).
Continuous-time filters for state estimation from point process models of neural data
.
Statistica Sinica
,
18
(
4
),
1293
1310
. doi:10.1016/j.biotechadv.2011.08.021.Secreted
Eden
,
U. T.
,
Frank
,
L. M.
,
Barbieri
,
R.
,
Solo
,
V.
, &
Brown
,
E. N.
(
2004
).
Dynamic analysis of neural encoding by point process adaptive filtering
.
Neural Computation
,
16
,
971
998
.
Efron
,
B.
, &
Tibshirani
,
R. J.
(
1994
).
An introduction to the bootstrap
.
Boca Raton, FL
:
CRC Press
.
Frey
,
R.
, &
Runggaldier
,
W. J.
(
2001
).
A nonlinear filtering approach to volatility estimation with a view towards high frequency data
.
International Journal of Theoretical and Applied Finance
,
4
(
2
),
199
210
.
Ganguli
,
D.
, &
Simoncelli
,
E.
(
2014
).
Efficient sensory encoding and Bayesian inference with heterogeneous neural populations
.
Neural Comput
,
26
(
10
),
2103
2134
. doi:10.1162/NECO_a_00638
Gill
,
R. D.
, &
Levit
,
B. Y.
(
1995
).
Applications of the Van Trees inequality: A Bayesian Cramér-Rao bound
.
Bernoulli
,
1
(
1–2
),
59
79
.
Harel
,
Y.
,
Meir
,
R.
, &
Opper
,
M.
(
2015
). A tractable approximation to optimal point process filtering: Application to neural encoding. In
C.
Cortes
,
N. D.
Lawrence
,
D. D.
Lee
,
M.
Sugiyama
, &
R.
Garnett
(Eds.),
Advances in neural information processing systems
,
28
(pp.
1603
1611
).
Red Hook, NY
:
Curran
.
Harper
,
N.
, &
McAlpine
,
D.
(
2004
).
Optimal neural population coding of an auditory spatial cue
.
Nature
,
430
(
7000
),
682
686
. (
1397
) doi:10.1038/nature02768
Henderson
,
H.
, &
Searle
,
S.
(
1981
).
On deriving the inverse of a sum of matrices
.
SIAM Review
,
23
(
1
),
53
60
.
Huys
,
Q. J. M.
,
Zemel
,
R. S.
,
Natarajan
,
R.
, &
Dayan
,
P.
(
2007
).
Fast population coding
.
Neural Computation
,
19
(
2
),
404
441
. doi:10.1162/neco.2007.19.2.404
Komaee
,
A.
(
2010
).
State estimation from space-time point process observations with an application in optical beam tracking
. In
Proceedings of the IEEE 44th Annual Conference on Information Sciences and Systems
.
Piscataway, NJ
:
IEEE
. doi:10.1109/CISS.2010.5464949
Lilliefors
,
H. W.
(
1967
).
On the Kolmogorov-Smirnov test for normality with mean and variance unknown
.
Journal of the American Statistical Association
,
62
(
318
),
399
402
.
Macke
,
J. H.
,
Buesing
,
L.
, &
Sahani
,
M.
(
2015
). Estimating state and parameters in state space models of spike trains. In
Z.
Chen
(Ed.),
Advanced State Space Methods for Neural and Clinical Data
.
Cambridge
:
Cambridge University Press
.
Maybeck
,
P.
(
1979
).
Stochastic models, estimation, and control
.
Orlando, FL
:
Academic Press
.
Minka
,
T.
(
2001
).
Expectation propagation for approximate Bayesian inference
. In
Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence
(pp.
362
369
).
San Mateo, CA
:
Morgan Kaufmann
.
Øksendal
,
B.
(
2003
).
Stochastic differential equations
.
Berlin
:
Springer
.
Opper
,
M.
(
1998
). A Bayesian approach to online learning. In
D.
Saad
(Ed.),
Online learning in neural networks
(pp.
363
378
).
Cambridge
:
Cambridge University Press
.
Pilarski
,
S.
, &
Pokora
,
O.
(
2015
).
On the Cramér-Rao bound applicability and the role of Fisher information in computational neuroscience
.
BioSystems
,
136
,
11
22
. doi:10.1016/j.biosystems.2015.07.009
Pillow
,
J. W.
,
Shlens
,
J.
,
Paninski
,
L.
,
Sher
,
A.
,
Litke
,
A. M.
,
Chichilnisky
,
E.
, &
Simoncelli
,
E. P.
(
2008
).
Spatio-temporal correlations and visual signalling in a complete neuronal population
.
Nature
,
454
(
7207
),
995
999
.
Radhakrishna Rao
,
C.
(
1945
).
Information and the accuracy attainable in the estimation of statistical parameters
.
Bull. Calcutta Math. Soc.
,
37
,
81
91
.
Rhodes
,
I.
, &
Snyder
,
D.
(
1977
).
Estimation and control performance for space-time point-process observations
.
IEEE Transactions on Automatic Control
,
22
(
3
),
338
346
. doi:10.1109/TAC.1977.1101534
Segall
,
A.
(
1976
).
Recursive estimation from discrete-time point processes
.
IEEE Transactions on Information Theory
,
22
(
4
),
422
431
.
Segall
,
A.
(
1978
).
Centralized and decentralized control schemes for Gauss-Poisson processes
.
IEEE Transactions on Automatic Control
,
23
(
1
),
47
57
.
Segall
,
A.
,
Davis
,
M. H.
, &
Kailath
,
T.
(
1975
).
Nonlinear filtering with counting observations
.
IEEE Transactions on Information Theory
,
21
(
2
),
143
149
.
Segall
,
A.
, &
Kailath
,
T.
(
1975
).
The modeling of randomly modulated jump processes
.
IEEE Transactions on Information Theory
,
21
(
2
),
135
143
.
Snyder
,
D.
(
1972
).
Filtering and detection for doubly stochastic Poisson processes
.
IEEE Transactions on Information Theory
,
18
(
1
),
91
102
. doi:10.1109/TIT.1972.1054756
Snyder
,
D.
, &
Miller
,
M.
(
1991
).
Random point processes in time and space
(2nd ed.).
Berlin
:
Springer
.
Snyder
,
D.
,
Rhodes
,
I.
, &
Hoversten
,
E.
(
1977
).
A separation theorem for stochastic control problems with point-process observations
.
Automatica
,
13
(
1
),
85
87
.
Solo
,
V.
(
2000
).
“Unobserved” Monte Carlo method for identification of partially observed nonlinear state space systems, Part II: Counting process observations
. In
Proceedings of the 39th IEEE Conference on Decision and Control
(pp.
3331
3336
).
Piscataway, NJ
:
IEEE
.
Susemihl
,
A.
(
2014
).
Optimal population coding of dynamic stimuli
.
Ph.D. diss., Technical University Berlin
.
Susemihl
,
A.
,
Meir
,
R.
, &
Opper
,
M.
(
2011
). Analytical results for the error in filtering of gaussian processes. In
J.
Shawe-Taylor
,
R.
Zemel
,
P.
Bartlett
,
F.
Pereira
, &
K.
Weinberger
(Eds.),
Advances in neural information processing systems
,
24
(pp.
2303
2311
).
Red Hook, NY
:
Curran
.
Susemihl
,
A.
,
Meir
,
R.
, &
Opper
,
M.
(
2013
).
Dynamic state estimation based on Poisson spike trains—towards a theory of optimal encoding
.
Journal of Statistical Mechanics: Theory and Experiment
,
2013
(
3
),
P03009
.
Susemihl
,
A.
,
Meir
,
R.
, &
Opper
,
M.
(
2014
). Optimal neural codes for control and estimation. In
Z.
Ghahramani
,
M.
Welling
,
C.
Cortes
,
N. D.
Lawrence
, &
K. Q.
Weinberger
(Eds.),
Advances in neural information processing systems
,
27
(pp.
2987
2995
).
Red Hook, NY
:
Curran
.
Twum-Danso
,
N.
, &
Brockett
,
R.
(
2001
).
Trajectory estimation from place cell data
.
Neural Networks
,
14
(
6–7
),
835
844
. doi:10.1016/S0893-6080(01)00079-X
Wang
,
Z.
,
Stocker
,
A. A.
, &
Lee
,
D. D.
(
2016
).
Efficient neural codes that minimize LP reconstruction error
.
Neural Computation
,
28
,
2656
2686
.
Yaeli
,
S.
, &
Meir
,
R.
(
2010
).
Error-based analysis of optimal tuning functions explains phenomena observed in sensory neurons
.
Front. Comput. Neurosci.
,
4
,
130
. doi:10.3389/fncom.2010.00130