## Abstract

In a constantly changing world, animals must account for environmental volatility when making decisions. To appropriately discount older, irrelevant information, they need to learn the rate at which the environment changes. We develop an ideal observer model capable of inferring the present state of the environment along with its rate of change. Key to this computation is an update of the posterior probability of all possible change point counts. This computation can be challenging, as the number of possibilities grows rapidly with time. However, we show how the computations can be simplified in the continuum limit by a moment closure approximation. The resulting low-dimensional system can be used to infer the environmental state and change rate with accuracy comparable to the ideal observer. The approximate computations can be performed by a neural network model via a rate-correlation-based plasticity rule. We thus show how optimal observers accumulate evidence in changing environments and map this computation to reduced models that perform inference using plausible neural mechanisms.

## 1 Introduction

Animals continuously make decisions in order to find food, identify mates, and avoid predators. However, the world is seldom static. Information that was critical yesterday may be of little value now. Thus, when accumulating evidence to decide on a course of action, animals weight new evidence more strongly than old (Pearson, Heilbronner, Barack, Hayden, & Platt, 2011). The rate at which the world changes determines the rate at which an individual should discount previous information (Deneve, 2008; Veliz-Cuba, Kilpatrick, & Josić, 2016). For instance, when actively tracking prey, a predator may use visual information obtained only within the last second (Olberg, Worthington, & Venator, 2000; Portugues & Engert, 2009), while social insect colonies integrate evidence that can be hours to days old when deciding on a new home site (Franks, Pratt, Mallon, Britton, & Sumpter, 2002). These environmental state variables (e.g., prey location, best home site) are constantly changing, and the timescale on which the environment changes is unlikely to be known in advance. Thus, to make accurate decisions, animals must learn how rapidly their environment changes (Wilson, Nassar, & Gold, 2010).

Evidence accumulators are often used to model decision processes in static and fluctuating environments (Smith & Ratcliff, 2004; Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006). These models show how noisy observations can be accumulated to provide a probability that one among multiple alternatives is correct (Gold & Shadlen, 2007; Beck et al., 2008). They explain a variety of behavioral data (Ratcliff & McKoon, 2008; Brunton, Botvinick, & Brody, 2013), and electrophysiological recordings suggest that neural activity can reflect the accumulation of evidence (Huk & Shadlen, 2005; Kira, Yang, & Shadlen, 2015). Since normative evidence accumulation models determine the belief of an ideal observer, they also show the best way to integrate noisy sensory measurements and can tell us if and how animals fail to use such information optimally (Bogacz et al., 2006; Beck et al., 2008).

Early decision-making models focused on decisions between two choices in a static environment (Wald & Wolfowitz, 1948; Gold & Shadlen, 2007). Recent studies have extended this work to more ecologically relevant situations, including multiple alternatives (Churchland, Kiani, & Shadlen, 2008; Krajbich & Rangel, 2011), multidimensional environments (Niv, Daniel, Geana, Gershman, Leong, & Radulescu, 2015), and cases where the correct choice (McGuire, Nassar, Gold, & Kable, 2014; Glaze, Kable, & Gold, 2015) or context (Shvartsman, Srivastava, & Cohen, 2015) changes in time. In these cases, normative models are more difficult to derive and analyze (Wilson & Niv, 2011), and their dynamics are more complex. However, methods of sequential and stochastic analysis are still useful in understanding their properties (Wilson et al., 2010; Veliz-Cuba et al., 2016).

We examine the case of a changing environment where an optimal observer discounts prior evidence at a rate determined by environmental volatility. In this work, a model performs optimally if it maximizes the likelihood of predicting the correct environmental state, given the noise in observations (Bogacz et al., 2006). Experiments suggest that humans learn the rate of environmental fluctuations to make choices nearly optimally (Glaze et al., 2015). During dynamic foraging experiments where the choice with the highest reward changes in time, monkeys also appear to use an evidence discounting strategy suited to the environmental change rate (Sugrue, Corrado, & Newsome, 2004).

However, most previous models have assumed that the rate of change of the environment is known ahead of time to the observer (Glaze et al., 2015; Veliz-Cuba et al., 2016). Wilson et al. (2010) developed a model of an observer that infers the rate of environmental change from observations. To do so, the observer computes a joint posterior probability of the state of the environment, the time since the last change in the environment, and a count of the number of times the environment has changed (the change point count). With more measurements, such observers improve their estimates of the change rate and are therefore better able to predict the environmental state. Inference of the change rate is most important when an observer makes fairly noisy measurements and cannot determine the current state from a single observation.

We extend previous accumulator models of decision making to the case of multiple, discrete choices with asymmetric, unknown transition rates between them. We assume that the observer is primarily interested in the current state of the environment, often referred to as the correct choice in decision-making models (Bogacz et al., 2006). Therefore, we show how an ideal observer can use sensory evidence to infer the rates at which the environment transitions between states and simultaneously use these inferred rates to discount old evidence and determine the present environmental state.

Related models have been studied (Wilson et al., 2010; Adams & MacKay, 2007). However, they relied on the assumption that after a change, the new state does not depend on the previous state. This excludes the possibility of a finite number of states. For example, in the case of two choices, knowledge of the present state determines with complete certainty the state after a change, and the two are thus not independent. For cases with a finite number of choices, our algorithm is simpler than previous ones. The observer only needs to compute a joint probability of the environmental state and the change point count.

The storage needed to implement our algorithms grows rapidly with the number of possible environmental states. However, we show that moment closure methods can be used to decrease the needed storage considerably, albeit at the expense of accuracy and the representation of higher-order statistics. Nonetheless, when measurement noise is not too large, these approximations can be used to estimate the most likely transition rate and the current state of the environment. This motivates a physiologically plausible neural implementation for the present computation. We show that a Hebbian learning rule that shapes interactions between multiple neural populations representing the different choices allows a network to integrate inputs nearly optimally. Our work therefore links statistical principles for optimal inference with stochastic neural rate models that can adapt to the environmental volatility to make near-optimal decisions in a changing environment.

## 2 Optimal Evidence Accumulation for Known Transition Rates

We start by revisiting the problem of inferring the current state of the environment from a sequence of noisy observations. We assume that the number of states is finite and the state of the environment changes at times unknown to the observer. We first review the case when the rate of these changes is known to the observer. In later sections, we assume that these rates must also be learned. Following Veliz-Cuba et al. (2016), we derived a recursive equation for the likelihoods of the different states and an approximating stochastic differential equation (SDE). Similar derivations were presented for decisions between two choices by Deneve (2008) and Glaze et al. (2015).

An ideal observer decides between choices, based on successive observations at times . We denote each possible choice by , , with being the correct choice at time . The transition rates , , correspond to the known probabilities that the state changes between two observations: . The observer makes measurements, at times with known conditional probability densities . Here, and elsewhere, we assume that the observations are conditionally independent. We also abuse notation slightly by using to denote a probability, or the value of a probability density function, depending on the argument. We use explicit notation for the probability density function when there is a potential for confusion.

We denote by the vector of observations and by the conditional probability . To make a decision, the observer can compute the index that maximizes the posterior probability, . Therefore, is the most probable state, given the observations .

The nonlinear term in equation 2.2 implies that in the absence of noise, the system has a stable fixed point and older evidence is discounted. Such continuum models of evidence accumulation are useful because they are amenable to the methods of stochastic analysis (Bogacz et al., 2006). Linearization of the SDE provides insights into the system’s local dynamics (Glaze et al., 2015; Veliz-Cuba et al., 2016) and can be used to implement the inference process in model neural networks (Bogacz et al., 2006; Veliz-Cuba et al., 2016).

We next extend this approach to the case when the observer infers the transition rates, from measurements.

## 3 Environments with Symmetric Transition Rates

We first derive the ideal observer model when the unknown transition rates are symmetric, when , and . This simplifies the derivation, since the observer only needs to estimate a single change-point count. The asymmetric case discussed in section 4 follows the same idea, but the derivation is more involved since the observer must estimate multiple counts.

Our problem differs from previous studies in two key ways (Adams & MacKay, 2007; Wilson et al., 2010): First, we assume the observer tries to identify the most likely state of the environment at time . To do so, the observer computes the joint conditional probability, of the current state, , and the number of environmental changes, , since beginning the observations. Previous studies focused on obtaining the predictive distribution, . The two distributions are closely related, as .

Adams and Mackay (2007) and Wilson et al. (2010) derive a probability update equation for the run length and the number of change points and use this equation to obtain the predictive distribution of future observations. We show that it is not necessary to compute run-length probabilities when the number of environmental states is finite. Instead we derive a recursive equation for the joint probability of the current state, , and number of change points, . As a result, the total number of possible pairs grows as (linearly in ) where is the fixed number of environmental states , rather than (quadratically in ) as in Wilson et al. (2010).^{1}

### 3.1 Symmetric Two-State Process

We first derive a recursive equation for the probability of two alternatives, in a changing environment, where the change process is memoryless and the change rate, , is symmetric and initially unknown to the observer (see Figure 1A). The most probable choice given the observations up to a time, , can be obtained from the log of the posterior odds ratio . The sign of indicates which option is more likely, and its magnitude indicates the strength of this evidence (Bogacz et al., 2006; Gold & Shadlen, 2007). Old evidence should be discounted according to the inferred environmental volatility. Since this is unknown, an ideal observer computes a probability distribution for the change rate, (see Figure 1C), along with the probability of environmental states.

Let be the number of change points and the count of non-change-points between times and (see Figure 1B). The process is a pure birth process with birth rate . The observer assumes no changes prior to the start of observation, , and must make at least two observations, and to detect a change.

We conjecture that when measurements are noisy, the variance of the distribution does not converge to a point mass at the true rate, in the limit of infinitely many observations, ; that is, the estimate of is not consistent. As we have shown, to infer the rate, we need to infer the parameter of a Bernoulli variable. It is easy to show that the posterior over this parameter converges to a point mass at the actual rate value if the probability of misclassifying the state is known to the observer (Djuric & Huang, 2000). However, when the misclassification probability is not known, the variance of the posterior remains positive even in the limit of infinitely many observations. In our case, when measurements are noisy, the observer does not know the exact number of change points at a finite time. Hence, the observer does not know exactly how to weight previous observations to make an inference about the current state. As a result, the probability of misclassifying the current state may not be known. We conjecture that this implies that even in the limit , the posterior over has positive variance (see Figure 2D).

In Figure 3, we compare the performance of this algorithm in three cases: when the observer knows the true rate (point mass prior over the true rate ), when the observer assumes a wrong rate (point mass prior over an erroneous ), and when the observer learns the rate from measurements (flat prior over ). We define performance as the probability of a correct decision.

Under the interrogation protocol, the observer infers the state of the environment at a fixed time. As expected, performance increases with interrogation time and is highest if the observer uses the true rate (see Figure 3A and equation 2.1). Performance plateaus quickly when the observer assumes a fixed rate and more slowly if the rate is learned. The performance of observers who learn the rate slowly increases toward that of observers who know the true rate. In Figure 3B, we present the performance of the unknown-rate algorithm at four different times () and compare it to the asymptotic values with different assumed rates (green curves).

Note that an observer who assumes an incorrect change rate can still perform near optimally (e.g., curve for 0.03 in Figure 3A), especially when the signal-to-noise ratio (SNR) is quite high. The SNR is the difference in means of the likelihoods divided by their common standard deviation. Change rate inference is more effective at lower SNR values, in which case multiple observations are needed for an accurate estimate of the present state. However, at very low SNR values, the observer will not be able to substantially reduce uncertainty about the change rate, resulting in high uncertainty about the state.

In the free response protocol, the observer makes a decision when the log-odds ratio reaches a predefined threshold. In Figure 3C, we present simulation results for this protocol in a format similar to Figure 3A, with empirical performance as a function of average hitting time. Each performance level corresponds to unique log-odds threshold. Similar to the interrogation protocol (see Figure 3A), the performance of the free response protocol saturates much more quickly for an observer who fixes the change rate estimate than one that infers this rate over time.

### 3.2 Symmetric Multistate Process

We next consider evidence accumulation in an environment with an arbitrary number of states, , with symmetric transition probabilities, , whenever . We define for any , so that the probability of remaining in the same state becomes , for all . The symmetry in transition rates means that an observer still needs only to track the total number of change points, as in section 3.1.

## 4 Environments with Asymmetric Transition Rates

We will show that our inference algorithm assigns positive probability only to change-point matrices that correspond to possible transition paths between the states . Many nonnegative integer matrices with entries that sum to are not possible change-point matrices . A combinatorial argument shows that when , the number of possible pairs, grows quadratically with the number of steps, to leading order. It can also be shown that the growth is polynomial for , although we do not know the growth rate in general (see Figure 4B). An ideal observer has to assign a probability of each of these states, which is much more demanding than in the symmetric rate case where the number of possible states grows linearly in .

We compute the conditional probability by marginalizing over all possible transition matrices, . To do so, we relate the probabilities of and . Note that if the observer assumes the columns are independent prior to any observations, then the exit rates conditioned on the change-point counts, , are independent for all states, .

To motivate the derivation, we first consider a single state, and assume that the environmental state has been observed perfectly over time steps, but the transition rates are unknown. Therefore, all are known to the observer , but the are not. The state of the system at time , given that it was in state at time is a categorical random variable, and , for . The observed transitions are independent samples from a categorical distribution with unknown parameters .

The conjugate prior to the categorical distribution is the Dirichlet distribution, and we therefore use it as a prior on the change-point probabilities. For simplicity, we again assume a flat prior over , that is, , where is the indicator function on the standard -simplex, .

This algorithm can be used to infer unequal transition rates as shown in Figure 4. Figures 4C through 4E show that the mode of the joint posterior distribution, approaches the correct rates, while its variance decreases. As in section 3.1, we conjecture that this joint density does not converge to a point mass at the true rate values unless the SNR is infinite.

## 5 Continuum Limits and Stochastic Differential Equation Models

We next derive continuum limits of the discrete probability update equations for the symmetric case discussed in section 3. We assume that observers make measurements rapidly, so we can derive a stochastic differential equation (SDE) that models the update of an ideal observer’s belief (Gold & Shadlen, 2007). SDEs are generally easier to analyze than their discrete counterparts (Gardiner, 2004). For example, response times can be studied by examining mean first passage times of log-likelihood ratios (Bogacz et al., 2006), or log likelihoods (McMillen & Holmes, 2006), which is much easier done in the continuum limit (Redner, 2001). For simplicity, we begin with an analysis of the two-state process and then extend our results to the multistate case. The full inference model, Figure 5A, in the two-state case can be reduced using moment closure to truncate the resulting infinite system of SDEs to an approximate finite system (see Figure 5B). This both saves computation time and suggests a potential mechanism for learning the rate of environmental change. We map this approximation to a neural population model in section 6 (see Figure 5C). This model consists of populations that track the environmental state and synaptic weights that learn the transition rate .

### 5.1 Derivation of the Continuum Limit

#### 5.1.1 Two-State Symmetric Process

We first assume that the state of the environment, , is a homogeneous continuous-time Markov chain with state-space . The probability of transitions between the two states is symmetric and given by , where . The number of change points, up to time is a Poisson process with rate . An observer infers the present state from a sequence of observations, , made at equally spaced times,^{2} with . Each observation, has probability (see Veliz-Cuba et al., 2016, for more details). We again use the notation where is the time of the th observation.

^{3}Using the approximation for small yields Since the proportionality constant is equal for all , we drop it in the SDE for the log-likelihood (see Veliz-Cuba et al., 2016, for details of the derivation), where and satisfies with for .

#### 5.1.2 Two States with Asymmetric Rates

Next we consider the case where the state of the environment, , is still a continuous-time Markov chain with state-space , but the probabilities of transition between the two states are asymmetric: , , where . Thus, we must separately enumerate change points, and , to obtain an estimate of the rates and . In addition, we will rescale the enumeration of non-change-points so that , in anticipation of the divergence of as . This will mean the total dwell time, will be continuous, while the change-point count will be discrete. The quantities are then placed into a matrix, , where for and . Note that if the number of change points, , and the total dwell time in a state, , were known, the change rate could be estimated as .

#### 5.1.3 Multiple States with Symmetric Rates

The continuum limit in the case of states, , with symmetric transition rates can be derived as with (see section A.4 for details). Again, denote the transition probabilities by and the rate of switching from one to any other state by .

Equation 5.12 is again an infinite set of stochastic differential equations, one for each pair , , . We have some freedom in choosing initial conditions at . For example, since , we can use the Poisson distribution discussed in the case of two states.

### 5.2 Moment Hierarchy for the Two-State Process

In the previous section, we approximated the evolution of the joint probabilities of environmental states and change-point counts. The result, in the symmetric case, was an infinite set of SDEs, one for each combination of state and change-point values . However, an observer is mainly concerned with the current state of the environment. The change-point count is important for this inference but may not be of direct interest itself. We next derive simpler, approximate models that do not track the entire joint distribution over all change-point counts, only essential aspects of this distribution. We do so by deriving a hierarchy of iterative equations for the moments of the distribution of change-point counts, , focusing specifically on the two-state symmetric case.