## Abstract

The Eriksen task is a classical paradigm that explores the effects of competing sensory inputs on response tendencies and the nature of selective attention in controlling these processes. In this task, conflicting flanker stimuli interfere with the processing of a central target, especially on short reaction time trials. This task has been modeled by neural networks and more recently by a normative Bayesian account. Here, we analyze the dynamics of the Bayesian models, which are nonlinear, coupled discrete time dynamical systems, by considering simplified, approximate systems that are linear and decoupled. Analytical solutions of these allow us to describe how posterior probabilities and psychometric functions depend on model parameters. We compare our results with numerical simulations of the original models and derive fits to experimental data, showing that agreements are rather good. We also investigate continuum limits of these simplified dynamical systems and demonstrate that Bayesian updating is closely related to a drift-diffusion process, whose implementation in neural network models has been extensively studied. This provides insight into how neural substrates can implement Bayesian computations.

## 1. Introduction

The psychological (Laming, 1968; Ratcliff, 1978; Ratcliff, Van Zandt, & McKoon, 1999) and neural bases of decision making (Platt & Glimcher, 2001; Schall, 2001; Gold & Shadlen, 2001) have been widely studied, particularly in constrained situations such as the two-alternative forced-choice (2AFC) task. In 2AFC, subjects are required to discriminate a stimulus and give one of two permissible responses. The sequential probability ratio test (SPRT) is optimal for 2AFC tasks, whether the objective is to minimize the mean reaction time (RT) for a desired accuracy level (Wald & Wolfowitz, 1948) or to minimize a linear cost function in accuracy and detection delay under the Bayesian formulation (Liu & Blostein, 1992). The SPRT compares the relative likelihoods of noisy inputs given two possible hypotheses and reaches a decision when the cumulative evidence for one of them exceeds a fixed threshold. Performance on 2AFC tasks seems broadly consistent with the SPRT (Ratcliff & Smith, 2004), and there is evidence that competing neural populations subserving decision making may implement a strategy close to the SPRT (Gold & Shadlen, 2001, 2002; Schall, 2001; Shadlen & Newsome, 2001; Roitman & Shadlen, 2002; Schall, Stuphorn, & Brown, 2002). Moreover, the continuum limit of SPRT is an analytically tractable drift-diffusion model (DDM) (Holmes et al., 2005), which yields explicit expressions for error rates and reaction times that can be used to investigate reward-rate maximization in 2AFC (Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006), and it has been shown that various neural network models of decision making (Cohen, Dunbar, & McClelland, 1990; Cohen, Servan-Schreiber, & McClelland, 1992; Usher & McClelland, 2001) can be reduced to variants of the DDM (Bogacz et al., 2006).

The Eriksen flanker task (Eriksen & Eriksen, 1974) is an extension of the classical 2AFC task in which the decision is complicated by potentially conflicting distractor inputs. Subjects are required to discriminate a central target stimulus (e.g., the letter H or S) flanked by distractors. Flankers can be either compatible with the central target (e.g., HHHHH) or incompatible (e.g., HHSHH). Subjects display a compatibility effect, being typically slower and less accurate on incompatible than compatible trials (Eriksen & Eriksen, 1974). Furthermore, subjects perform at worse than chance level for short RTs for incompatible trials only. This dip in accuracy implies that flanker interference is particularly potent shortly after stimulus presentation. Figure 1 shows data from two instances of a deadlined Eriksen task. While specific details of reaction time distributions and relationships between accuracy and reaction time differ between the two studies, the basic compatibility effect and the dip in accuracy on incompatible trials are prominent in both.

Since the Eriksen task extends the standard 2AFC task, we suspect that optimal policy in this case is similar to the SPRT. In this vein, Yu, Dayan, and Cohen (in press) modeled the computations underlying the Eriksen task as iterative Bayesian updating, with the decision being made (and the trial terminated) when the cumulative posterior for one of the two possible target stimuli exceeds a fixed threshold. It was also proposed that the apparent suboptimality in performance can be explained by either an incorrect prior on the relative frequency of compatible and incompatible trials (compatibility bias model) or by inherent spatial overlap of visual processing neurons (spatial uncertainty model; Yu et al., in press). Here we reduce the Bayesian models to simpler dynamical systems and study them analytically and numerically. While the simpler models closely approximate the original ones in dynamics and perfomance, their analytical tractability yields explicit expressions for the dependence of inferential and psychometric quantities on model parameters. We discuss the relationship between exact Bayesian inference and drift-diffusion processes, emphasizing the link that this establishes between Bayesian updating and the neural substrates that may execute it. Our analysis also reveals the formal similarity of computations underlying the compatibility bias and spatial uncertainty models, which were motivated by disparate experimental literature and were formulated differently within the Bayesian framework.

The article is organized as follows. After reviewing the Bayesian inference models in section 2, in section 3 we derive and analyze the simplified models: uncoupled, linear discrete dynamical systems. From these, we derive explicit criteria on parameters that predict the dip in accuracy for incompatible trials, and we compare accuracies and RT distributions generated by the full and simplified models. In section 4 we show that the DDM is a continuum limit of the simplified models, and from this, we derive analytical predictions for mean posterior probabilities. We also compute accuracy versus time curves and reaction time distributions under an approximation that violates the first passage threshold crossing criterion adopted in Yu et al. (in press) but permits explicit analysis, and we provide direct comparisons between behavioral data and predictions of the full and approximate compatibility bias models. Section 5 contains a summary and discussion.

## 2. A Bayesian Framework for the Eriksen Task

*M*= 1 if compatible,

*M*= 2 if incompatible), and the stochastic relationship between the trial type

*M*and the stimuli

**X**, and between the stimuli and the noisy inputs into the visual system

**X**. For simplicity, it was assumed that there are three stimuli, , for left, center, and right, respectively; and each one of three neural units or populations responds to one stimulus. Here the pairs of left and right flankers are combined in

*s*

_{1}and

*s*

_{3}, respectively, and we assume that all three inputs contain independent noise, among the units and populations and over time. Using integers

*s*= ±1 to denote S and H, and

_{i}*M*= 1, 2 to denote compatible and incompatible trials, respectively, we may formally describe the process as For the compatibility bias model, the prior probability β for compatible trials is assumed to be higher than the “true” value 0.5, and the inputs are taken to be normally distributed as a function of their respective stimuli and independent of neighboring stimuli: that is, at each step

*t*, the

*x*(

_{i}*t*) are independently drawn from normal distributions with means

*s*and standard deviations σ. We denote this procedure below by .

_{i}*i*∈ {−1, +1},

*j*∈ {1, 2}. Based on Bayes' rule, this yields our inference model: four discrete-time dynamical equations, coupled through normalization, with initial conditions To make a decision based on the accumulating inputs, we compare the cumulative marginal posterior probability, against a decision threshold

*q*, a policy closely related to the SPRT (Wald, 1947). As soon as

*P*(

*s*

_{2}=

*i*∣

**X**

_{t}) exceeds

*q*for

*i*= −1 or

*i*= +1, the system chooses the corresponding response (H or S) and terminates observations for the current trial. The computation for the marginal posterior probability over compatibility is analogous:

*P*(

*M*=

*j*∣

**X**

_{t}) =

*z*

^{t}_{−1,j}+

*z*

^{t}_{1,j}.

Examples of accuracies and RTs thus predicted are shown in Figure 4. For these and other calculations, unless otherwise noted, we adopt the parameter values used in Yu et al. (in press): σ = 9 for the compatibility bias model and σ_{1} = 7, σ_{2} = 5, *a*_{1} = 0.7, *a*_{2} = 0.3 for the spatial uncertainty model, and *q* = 0.9 for both.

## 3. Linearization and Parametric Dependence

Yu et al. (in press) showed that certain choices of parameters allow both the compatibility bias and spatial uncertainty models to capture key properties of the behavioral data in Figure 1 (see Figure 4 below). Here we derive general constraints on the parameters in each model that allow them to reproduce the behavioral data: σ for the compatibility bias model; *a*_{1}, *a*_{2}, σ_{1}, and σ_{2} for the spatial uncertainty model; and *n*, the number of distractors. While we cannot analyze the complex relationship between accuracy and reaction time directly, we wish to at least constrain parameters so that the mean posterior probability for *s*_{2} = 1 (the correct answer) dips below that for *s*_{2} = −1 after one or a few time steps of observations. Although the relative probability of a correct response at time *t* depends not just on the mean but also on higher-order moments, such an analysis would illuminate the magnitude and range of the effective parameters. Unfortunately, even this partial analysis is difficult for the original Bayesian model, as *P*(*s*_{2} ∣ **X**_{t}) involves the summation of two exponential functions of the inputs, as in equation 2.11, and there is no obvious way to derive the expectation of *P*(*s*_{2} ∣ **X**_{t}) as an explicit function of the parameters that specify the generation of the inputs **X**.

Due to such computational intractability, we instead work with a linearized approximation to the exact posterior update rule of equation 2.9. We will motivate and describe the approximations for the two Bayesian models and demonstrate via simulations that the parametric constraints derived from this approximate scheme provide useful bounds for the original Bayesian models.

### 3.1. The Compatibility Bias Model.

*s*can take on the value ±1. We now derive an approximation to equation 3.1 that is linear in the

_{i}*x*(

_{i}*t*)'s. With the following definition, the likelihood function for

*s*

_{2}= 1 and

*M*= 1 (i.e.,

*s*

_{1}=

*s*

_{2}=

*s*

_{3}= 1) can be approximated as follows: The first step uses the fact that quadratic and constant terms cancel in the ratios. The next two rely on Taylor series expansion of the exponential terms and the binomial series approximation: The approximation is justified by the fact that

*x*(

_{k}*t*) ∈ [−1 − 2σ, 1 + 2σ] with >99% probability if we can assume that σ ≫ 1. This latter assumption is reasonable since we are modeling the timescale at which, on average, many time steps of inputs are needed to perform the discrimination.

_{1}Θ

_{2}Θ

_{3}/8 in the numerators and denominator of equation 3.3 have canceled. Initial conditions are as in equation 2.10. Since this simplified system is still nonlinearly coupled through the denominator

*D*, we work with the joint probability instead. The two are related as follows: The joint probability

_{t}*v*obeys the uncoupled update rule: where the sign preceding each

^{t}_{i,j}*x*depends on

_{i}*i*and

*j*as in equation 3.3. As is apparent in equation 3.5,

*z*can be obtained by normalizing

_{ij}^{t}*v*on time step

_{ij}^{t}*t*, but

*v*cannot be used directly in the perceptual decision process, since a fixed threshold in the posterior probability space has no equivalent fixed value in the joint posterior space. However,

_{ij}^{t}*v*is sufficient for deriving bounds on the generative parameters that on average make the posterior probability for

_{ij}^{t}*s*

_{2}= 1 dip below that for

*s*

_{2}= −1 after one time step, when the inputs are generated from the incompatible stimulus array:

**X**= (−1, 1, − 1) (the analysis for

**X**= (1, − 1, 1) is analogous). Specifically, since

*P*(

*s*

_{2},

*M*∣

**X**

_{t}) =

*p*(

*s*

_{2},

*M*,

**X**

_{t})/

*p*(

**X**

_{t}), the condition 〈

*z*

^{1}

_{1, 1}+

*z*

^{1}

_{1,2}〉 < 〈

*z*

^{1}

_{−1, 1}+

*z*

^{1}

_{−1, 2}〉 is equivalent to 〈

*v*

^{1}

_{1, 1}〉 + 〈

*v*

^{1}

_{1, 2}〉 < 〈

*v*

^{1}

_{−1, 1}〉 + 〈

*v*

^{1}

_{−1, 2}〉. We therefore require since the mean values of

*x*

_{1},

*x*

_{2}, and

*x*

_{3}are −1, 1, and −1, respectively, and the compatible terms are weighted by the compatibility prior bias β (and the incompatible ones weighted by 1 − β).

Hence, β > 3/4 is the necessary and sufficient condition for the average posterior probability for *s*_{2} = 1 to dip below that for *s*_{2} = −1 after one observation, when the true stimuli are the incompatible array (−1, 1, −1). More generally, we can show that the constraint is β > (n + 1)/(2n), where *n* is the total number of flankers. This makes intuitive sense, for it suggests that the dip is more prominent or more likely to happen when the subject's prior compatibility bias is stronger or the number of flankers is larger. Indeed, there are behavioral data suggesting that flanker interference effects are stronger when there is a lower frequency of incompatible trials (Gratton, Coles, & Donchin, 1992).

These analytical constraints guarantee a dip only in the posterior probability. As shown in Figure 2 (left), for a particular set of model parameters, the mean accuracy in compatible trials terminating within 20 time steps steadily decreases as a function of β, and it drops below .5, indicating the presence of a dip, for all values of β > 0.82: somewhat higher than β = 0.75, the lower bound of inequality (see equation 3.7) that results in a dip in posterior probability.

A major factor underlying the discrepancy between the two constraints is that we considered only the mean of the posterior probability, not the full distribution. The mean accuracy depends not only on the mean posterior value but also on higher moments. If the distribution were symmetric about its mean, the dip in the mean posterior would directly translate into a dip in accuracy, but as we will show in section 4, the distribution of the posterior trajectories is strongly skewed, and the interaction of that skewness with the decision threshold also plays a role in determining the presence of the dip in accuracy versus reaction time.

A second reason for the discrepancy is that the theoretical bounds are for the dip to occur in the posterior after one time step, whereas in the numerical simlations, due to the infrequency of responses at very short RTs, all trials that terminate within the first 20 time steps were used to estimate accuracy. If the temporal extent of the dip in the posterior distribution is very small (which is likely in boundary cases), then conditional accuracy may not fall below 0.5 when averaged over 20 time steps. The numerically obtained constraints are therefore likely to be more conservative than the analytical approximations.

### 3.2. The Spatial Uncertainty Model.

*D*is again the sum of the numerators and the parameters

^{′}_{t}*A*are Since the prior distribution is uniform, the initial conditions for equation 3.9 are Again, the constraint, is satisfied if

_{i}*A*

_{4}(

*a*

_{1}− 2

*a*

_{2}) < 2

*A*

_{3}(

*a*

_{1}−

*a*

_{2}), or, equivalently, if the ratio of means,

*a*

_{1}/

*a*

_{2}, lies in the interval where is the ratio of the variances. Intuitively, if the ratio

*a*

_{1}/

*a*

_{2}is too large, the flankers have negligible effects; if it is too small, the inputs lose their spatial selectivity altogether. More generally, if there are

*n*flankers, the range is We now compare these constraints with numerical simulations of the full inference model for the specific noise parameters (σ

_{1}= 7, σ

_{2}= 5). We simulated the full model using a range of values of

*a*

_{1}and

*a*

_{2}(with their sum held at 1) and obtained accuracy of all responses falling within the first 20 time steps as a function of

*a*

_{1}/

*a*

_{2}. As can be seen in Figure 2 (right), the accuracy in this short-RT bin is less than .5 when

*a*

_{1}/

*a*

_{2}falls within (0.70, 3.55), a somewhat more stringent condition than the analytically derived (approximate) interval (0.67, 3.98).

### 3.3. Evaluating the Cost of Linearization.

Direct simulations of the linear approximation can be compared with those of the original inference model. Figure 3 shows the results for the compatibility bias model for a particular setting of parameters (σ = 9), comparing the full inference model with the simplified iteration of equation 3.6. The same sequence of noisy observations *x _{i}*(

*t*) was used for both processes, and in computing the value of for the latter at each time step

*t*, normalization was applied only at that step. The agreement is remarkably good, validating our linear approximations to the products of probabilities (5-4) developed in section 3. The quality of the linear approximation for the spatial uncertainty model is similarly good (details not shown).

We can also simulate perceptual discrimination based on the linearized evidence accumulation process, using the first passage criterion for threshold crossing appropriate for free response conditions. As in Yu et al. (in press), we adopt the decision threshold *q* = 0.9 for both the compatibility bias and the spatial uncertainty model. The time span, taken here as 200 steps, is divided into 10 bins, and sample paths for the full model, equation 2.9, and the approximate decoupled system, equation 3.6, and its analogue for spatial uncertainty are computed. The decoupled results are then normalized by dividing by the sum ∑_{i, j}*v _{i, j}^{t}* at each

*t*in the current bin (normalization is not applied for steps 1 through

*t*− 1). The same (unit) step size is used in all cases. Responses are logged when the first of the probabilities

*P*(

*s*

_{2}= +1 ∣

**X**

_{t}) =

*P*(

*s*

_{2}= +1,

*M*= 1 ∣

**X**

_{t}) +

*P*(

*s*

_{2}= +1,

*M*= 2 ∣

**X**

_{t}) or

*P*(

*s*

_{2}= −1 ∣

**X**

_{t}) =

*P*(

*s*

_{2}= −1,

*M*= 1 ∣

**X**

_{t}) +

*P*(

*s*

_{2}= −1,

*M*= 2 ∣

**X**

_{t}) crosses

*q*. After collecting sufficiently many paths (2000 in this case), response time histograms are formed and the fraction of correct responses in each bin summed to yield accuracy versus time curves.

Figure 4 illustrates the results of such simulations for the compatibility and spatial uncertainty models. Accuracy versus reaction time, and empirical distributions of reaction time, are shown for both the full and approximate models. The approximate systems reproduce the characteristic dip in accuracy for fast, incompatible trials for both models, and the accuracy curves and reaction time distributions predicted by the approximate theory agree well with those of the full inference models. Note that the use of the first passage criterion for response produces reaction time distributions that agree with the exact model in details of their shapes: a rise at short reactions times to a peak, followed by a long tail. The distributions for incompatible trials are also flatter and shifted rightward compared to those for compatible trials, as in the data of Figure 1.

## 4. A Continuum Limit

The key difficulty in working with the discrete dynamical systems 3.3 and 3.9 lies in the nonlinear coupling of the posteriors *z _{i, j}^{t}* through the denominators

*D*and

_{t}*D*

_{t}^{′}. It can be proved that individual sample paths generated with the same noise inputs are identical whether computed by iteration of equations 3.3 and 3.9 or by the analogous uncoupled systems, equation 3.6, with posteriors normalized only at the last time step (cf. equation 3.5). (In computing the values for the approximate model 3.6 at each step

*t*for Figure 3, normalization was applied only at that step, but not at steps 1 through

*t*− 1, while the full iteration, equation 2.9, is normalized at every step.) However, it does not follow that we may average over many realizations of the unnormalized process and then normalize (as discussed further in Section 4.3), since these operations do not commute. Nonetheless, we can decouple the dynamics by replacing the normalization constant

*D*at each time step with its expectation 〈

_{t}*D*〉, which does not depend on the inputs, and by replacing that in turn by a constant. We then take continuum limits of the resulting decoupled linear systems to form stochastic differential equations (SDEs), allowing us to use simple analytical results to compute properties of interest. As described in section 5, these SDEs may be related to neurally based models of evidence accumulation.

_{t}### 4.1. Approximating the Denominators.

*D*〉 for the compatibility bias model: where the approximation comes from assuming that the input-dependent terms (functions of

_{t}*x*(

_{k}*t*)) are independent from the

*z*terms, which depend on the previous inputs

_{ij}**x**

_{k}(1), …,

**x**

_{k}(

*t*). Although the inputs are conditionally independent (cf. equation 2.5), they are marginally dependent. That is, if previous inputs favored a particular setting of

*s*

_{2}and

*M*, the current one also tends to do the same. For analytical simplicity, we ignore this statistical dependence. Note that in the limit as

*t*→ ∞, one of the

*z*'s (corresponding to the actual stimulus setting) converges to 1 (and the others to 0), and that no matter which

_{i, j}^{t}*z*it is, More generally, we expect 〈

_{i, j}^{t}*D*〉 to increase from 1 (

_{t}*D*

_{0}is just the sum of the priors) to , where μ denotes the mean value of the

*x*'s. Figure 5 shows exactly this for both compatible and incompatible stimuli for a particular setting of the model parameters, where

_{j}*s*

_{2}= 1 and we have averaged over 10

^{5}trials. Convergence is slower for incompatible stimuli, since the compatibility prior takes time to update from its initial value

*P*(

*M*) = 0.9.

*D*can exhibit large variance on individual trials, we assume

_{t}*D*≈ 〈

_{t}*D*〉 ≈ 1, and approximate the dynamics of equation 3.3 by the following linear, decoupled system: with initial conditions Similiar reasoning can be used to derive a linear, decoupled approximation for equation 3.9 for the spatial uncertainty model. The approximate dynamics for both models can be written as an iterated linear mapping in the following form, where the random variables η(

_{t}*t*) are drawn from a standard normal distribution, and

*a*and

_{i, j}*b*are constant parameters whose values depend on the model, the probability being computed, and the compatibility condition of the given trial.

_{i, j}*s*

_{2}= 1, we have and if is incompatible and

*s*

_{2}= 1, we have For

*s*

_{2}= −1, all the signs in

*a*above are reversed.

_{i, j}*s*

_{2}= 1, the calculations of section 3.2 imply: and for an incompatible stimulus array and

*s*

_{2}= 1: In both cases, the standard deviation of the noise is given by Figure 6 illustrates normal distributions from which these multiplicative terms in equation 4.4 are drawn.

### 4.2. Taking the Continuum Limit.

*z*represent the four posteriors . For finite but small δ

_{i, j}^{t}*t*= 1/

*k*, this represents a finer-grained discretization in which

*k*steps are taken for every one step of equation 4.4, the deterministic increments being of order δ

*t*and the random ones of order (Higham, 2001). Taking the limit δ

*t*→ 0 in equation 4.10, letting

*y*= log(

_{i, j}*z*), and appealing to the Ito formula (Oksendal, 2002, sec. 4.1), we obtain independent, uncoupled SDEs for

_{i, j}*y*(

_{i, j}*t*): with constant coefficients and

*B*=

_{i, j}*b*, whose values are specified in section 4.1. Since each

_{i, j}*z*(

_{i, j}*t*) represents a posterior probability, it should take values in the interval [0, 1], so we shall be interested in sample paths

*y*(

_{i, j}*t*) that start at

*y*(0) < 0 and satisfy −∞ <

_{i, j}*y*(

_{i, j}*t*) ≤ 0.

### 4.3. Analytical Approximations for the Mean Posteriors.

*y*(0) = μ

_{0}and

*t*= 0, the probability density function of

*y*at time

*t*is the following gaussian distribution: where (Here and below, we drop the subscripts {

*i, j*} in

*y*and

*z*in the understanding that the appropriate coefficients will be used in the final formulas.) We now transform back into

*z*-space, using

*y*= log(

*z*) and to obtain the density: The inverse transformation

*z*= exp(

*y*) takes the gaussian distribution over

*y*into a function skewed toward

*z*= 1, as illustrated in Figure 7.

*y*takes positive values on

*y*> 0 for all

*t*> 0. This presents a problem, since

*z*= exp(

*y*) > 1 for

*y*> 0, contrary to

*z*'s designation as a probability measure. Therefore, when computing expected values of , which requires integration of the quantity

*z p*(

*z, t*), we replace all values of

*z*> 1 by

*z*= 1 (or values of

*y*> 0 by

*y*= 0 in the equivalent integral over

*y*). However, to retain analytical tractability, we continue to assume a gaussian distribution over

*y*at time

*t*when generating the distribution at time

*t*+ 1—that is, we replace the inappropriate values of

*y*(or

*x*) only in the integral, not in the underlying drift-diffusion process. The expected (mean) value of

*z*is therefore approximated as which may be evaluated as explained in appendix A to yield Substituting values appropriate for the compatibility bias model from equations 4.5 and 4.6 for the parameters

*a*and

_{i, j}*b*, and hence for

_{i, j}*A*,

_{i, j}*B*, and via equations 4.13, for μ(

_{i, j}*t*) and σ(

*t*), we obtain estimates for the four mean posterior probabilities at time

*t*: where

*D*(

*t*) is the sum of all four probabilities that normalizes the expressions, and for compatible stimuli the functions μ(

*t*) and σ(

*t*) are and for incompatible stimuli, Here, we also use the fact that all sample paths start with the initial conditions specified in equation 2.10 and that μ

_{0}= μ(0) = log(

*z*(0)).

As noted at the beginning of this section, normalization and averaging do not commute. This may be understood in terms of the distributions of Figure 7 as follows. While each sample path can be computed for the uncoupled processes and normalized at time *t* to yield the same result as a sample path of the coupled system (cf. Figure 3), different normalization factors must typically be applied to the values of different paths *z _{i, j}*(

*t*) at each time

*t*. This would distort the distributions

*p*(

*z*), thereby changing their means. However, we may appeal to the observation that the expected value of the denominator remains close to 1 (cf. Figure 5) to conclude that this distortion is likely to be small, and proceed by dividing by the sums of the four mean probability trajectory values at time

_{i, j}, t*t*to normalize the resulting expressions.

Typical results for mean posterior probabilities are shown in Figure 8. The approximate predictions developed above are shown as dashed curves and the results of averaging over 5000 simulated trials of the full inference model, equation 2.9, are shown solid; compatible and incompatible trials are shown in dark and light, respectively. As above, we compute 200 steps for the discrete iteration of the full system, and we evaluate the corresponding quantities for *t* ∈ [0, 200] time units from the formulas above. For *P*(*M*) = 0.5 (not shown), joint posteriors for correct responses increase similarly for both compatible and incompatible cases, but *P*(*M*) = 0.9 elicts markedly different behaviors (top left). The compatibility posteriors show a general rise for compatible stimuli and a monotonic fall for incompatible stimuli, but the posterior probability shows a significant dip below 0.5 at early times for incompatible stimuli, while it rises monotonically for compatible stimuli. As discussed in section 5, the resulting accuracies exhibit similar patterns to the experimental data, with the incompatible case showing a dip in accuracy for early responses. Evolutions of the four individual posterior probabilities are shown in the lower panels of Figure 8.

Figure 8 illustrates that while the approximations developed here do not capture all the detailed behavior of the full model, they do provide reasonably good approximations to the average evolutions of the posteriors over the course of a trial. Timescales are slightly misestimated, and the compatibility posterior (top right) fails to reproduce the slight dip below 0.9 that occurs for compatible trials at early times, but the relative orderings of all the posteriors are correctly predicted. Overall, absolute errors in mean posteriors, computed as described at the end of this section, lie between 0.002 and 0.05, the largest being for in the case of incompatible stimuli (top right, lower curves).

*a*and

*b*from equations 4.7 to 4.9, and using the initial conditions μ

_{0}= log(1/4) for all four posteriors (see equation 3.11). For compatible stimuli, the function μ(

*t*) is for incompatible stimuli, and in both cases, The above results, presented in Figure 9, are not as good as those for the compatibility bias model. Nonetheless, the approximate model captures the key features of the evolving posteriors in the full model rather well, predicting the relative ordering of the posteriors appropriately in all cases except the incorrect choices

*P*(HHH) and

*P*(SHS) for incompatible stimuli; in that case, the approximation for

*P*(SHS) diverges from the correct function, increasing rather than decreasing as

*t*increases (lower right panel), for an absolute error of 0.12. Apart from this case, however, errors lie between 0.015 and 0.08.

### 4.4. Making Use of Explicit Mean Posteriors.

In addition to providing explicit expressions for posterior probabilities, the continuum limit also yields approximations for accuracy and reaction time distributions. To estimate accuracy as a function of response time under the free response protocol assumed by Yu et al. (in press), we compute the fraction of mass of the evolving probability density *p*(*z ^{t}*

_{i,1},

*z*

^{t}_{i,2}) that exceeds a given threshold

*z*

^{t}_{i,1}+

*z*

^{t}_{i,2}=

*q*at each time

*t*(recall equation 2.11). This procedure overestimates first passage times, since some of the sample paths that lie beyond the threshold

*q*at time

*t*may have crossed at earlier times, but it permits some analytical simplification. Without loss of generality, we shall assume that

*s*

_{2}= 1.

*p*(

*z*,

_{j}*t*) =

*p*(

*z*

^{t}_{1,j}), and the approximation comes from assuming

*p*(

*z*

^{t}_{1,1},

*z*

^{t}_{1,2}) ≈

*p*(

*z*

^{t}_{1,1})

*p*(

*z*

^{t}_{1,2}) for the uncoupled and linearized approximate dynamical system. This assumption greatly simplifies the computations, although the uncoupled processes are not entirely independent since they are activated by common inputs (

*x*

_{1},

*x*

_{2},

*x*

_{3}), albeit in different linear combinations. We also note that the variables

*z*should be nonnegative (cf. Figure 7). The domain of integration is pictured in Figure 12. The

_{j}*p*(

*z*,

_{j}*t*)'s take the forms derived in section 4.3, and since each is a normalized gaussian in the logarithmic

*y*variables, the integral of their product over the entire positive quadrant is 1. Hence, we have which is evaluated in appendix A to yield where Unfortunately, the final integral in equation 4.26 cannot be computed analytically, but it can be evaluated accurately and rapidly by numerical methods.

*z*

^{t}_{1,1}+

*z*

^{t}_{1,2}+

*z*

^{t}_{2,1}+

*z*

^{t}_{2,2}. (The term

*P*(

*s*

_{2}= 2 ∣

**X**

_{t})

_{est}is computed in a similar manner to equation 4.26, with the appropriate expressions for μ(

*t*), σ(

*t*) from section 4.3.) The denominator is the cumulative reaction time, and so its derivative with respect to

*t*provides the reaction time distribution. Hence, both accuracy and reaction time distributions can be approximated semianalytically. Figure 10 shows the resulting approximations to the mean posteriors for the compatibility bias model for a particular setting of model parameters. The dip in accuracy for incompatible trials is reproduced, and after an initial rise in accuracy for compatible trials, accuracy slowly declines.

As we have noted, sample paths of the SDE, equation 4.11, may pass across *q* and back, possibly repeatedly, in the interval (0, *t*), so these results do not directly correspond to the first-passage decision policy of the Bayesian models in Yu, Dayan, and Cohen (in press). This accounts for differences between the accuracy curves and reaction time distributions of Figure 1 and the free response results of section 3.3. For example, the compatibility bias-free response data of Figure 4 do not show the mild decline in accuracy for later compatible trials of Figure 10, although the spatial uncertainty simulations of Figure 4 do show such a decline. Nonetheless, the qualitative agreement between Figures 10 and 4 is quite good, and since the semiexplicit expression of equations 4.26 and 4.27 replaces the lengthy Monte Carlo simulations of section 3.3, it may be helpful in guiding parameter fits to data.

The posterior probability expressions can also be used to constrain parameter choices by requiring the derivative of at time *t* = 0 to be negative and finding corresponding conditions on the parameters. The results of this computation (details not shown) agree closely with those in section 3.

### 4.5. Fitting the Models to Data.

We now briefly describe the results of fitting the full models of section 2 and the reduced DD processes of sections 4.2 to 4.4 to the data of Servan-Schreiber et al. (1998), reproduced in Figure 1B. For the compatibility bias model, the parameters fitted are the noise level σ, prior β, threshold *q*, and step durations δ*t* (for DDM) and δ*t* (for the full model), which determine the overall timescale. For spatial uncertainty, they are σ_{1}, σ_{2}, *a*_{1}, *q*, and δ*t*, δ*t* (as in section 3.2, we set *a*_{2} = 1 − *a*_{1}). To these we add one further parameter, *T*_{0}, to account for time occupied by sensory decoding and motor response mechanisms, which superimposes a rightward shift on the RT distributions. (Such an “overhead time” might approximate the mean RT on a simple target detection task.)

We employ the same weighted Euclidean error norm as in Liu, Holmes, and Cohen (2008); see appendix A.2, below, for details. The parameter values obtained are as follows: compatibility bias: σ = 6.5, β = 0.87, *q* = 0.98, δ*t* = 0.95 ms, Δ*T* = 1.04 ms, and *T*_{0} = 90 ms; spatial uncertainty: σ_{1} = 6.9, σ_{2} = 5.1, *a*_{1} = 0.71, *q* = 0.92, δ*t* = 3.4 ms, Δ*T* = 0.33 ms, and *T*_{0} = 95 ms. Note that the noise levels are consistent with the assumptions of sections 3, 4.1, and 4.2: for example, 1/σ^{4} ≪ 1/σ^{2} (cf. equation 3.2). The fitting errors are as follows: compatibility bias: full model 2.5; DDM 2.3; spatial uncertainty: full model 2.1; DDM 1.8. In fitting, we excluded data points in the first (0–100 ms) and the last (900–1000 ms) of the 10 RT bins, since no accuracy data are available for the former, and all trials in which responses exceeded 1000 ms were placed in the latter (note the uptick in RT distributions at the right-most data point). However, we computed model data in that bin and in the next one (1000–1100 ms). Since our fitted values of the overhead time *T*_{0} push even the shortest model RTs beyond 100 ms, accuracies cannot be computed for the 0–100 ms bin unless we assume some premature responses that are initiated before stimulus onset. For such premature responses, the equal prevalence of H and S in the experiments ensures that accuracy approaches chance at very short decision times (cf. the upper left panels of Figures 8 and 9). Indeed, this chance performance is unavoidable, independent of the inference or decision strategy, since the response is deprived of stimulus information and cannot possibly correlate with stimulus identity.

These results are shown in Figure 11. Fit qualities are slightly better for the spatial uncertainty model, and in both cases, perhaps surprisingly, fit errors are slightly smaller for the reduced DDM than for the full Bayesian procedure. The fit errors are similar to that of 2.4 obtained in Liu et al. (2008) for the Gratton et al. (1988) data (see Figure 1A), using a DDM with variable drift rates derived from the neural network model of Cohen et al. (1992). That model contains eight free parameters, compared with five and six, respectively, in the present cases. Indeed, in Liu et al. (2008), six parameters are required to describe drift rates in the compatible and incompatible cases, modeling progressive increase in attention to the central stimulus, and these cases are fitted separately. In this study, compatible and incompatible trials are fitted simultaneously, and a single parameter in each model (the compatability prior β, or the weight *a*_{1}), along with Bayesian updating, serves to describe the accumulation of evidence.

Both models underestimate mean RTs for compatible trials, producing an excess of points in the 200–250 ms RT bin. They are also unable to capture the drop in accuracy at the shortest RTs on compatible trials (left panels) due to the *T*_{0} behavior noted above. They do reproduce this drop on incompatible trials, although the full compatibility bias model does not exhibit the dip below 50%. The spatial uncertainty model is substantially better in this regard (lower right panel), although it underestimates accuracy in the 400–900 ms part of the RT range for both the compatible and incompatible cases. In preliminary work, we also tried a modified norm that preferentially weights low RT data: this slightly improved fits of RT distributions but did not affect compatible accuracy fits. We also fitted the full and DD models to the data of Gratton et al. (1988; see Figure 1A) obtaining similar fit qualities, although the failure to capture the steady rise from 50% accuracy at low RTs for compatible trials was more striking in that case (model results not shown here).

We note that individual subjects exhibit large differences in signal-to-noise ratios and thresholds (in DDM fits; cf. Ratcliff et al., 1999; Bogacz, Hu, Cohen, & Holmes 2007), and that here we have averaged over all subjects to produce single sets of fit parameters for each model. As illustrated in Figure 1, there is also substantial variability in Eriksen data, perhaps due to differing deadlining protocols. (Deadlines are necessary to produce enough short reaction times and hence obtain a significant dip in accuracy on incompatible trials.) The resulting variability in motor preparation times can affect reaction times, and no allowance for this is made in the inference model, which describes only cognitive processing. Our additional parameter *T*_{0} only partially accounts for this, and in this case, it deprives us of accuracy data in the smallest RT bin.

## 5. Discussion and Conclusions

In Liu et al. (2008), a neural network model of the Eriksen task (Cohen et al., 1992; Servan-Schreiber et al., 1998) was linearized and reduced to a DDM with time-varying drift, allowing relatively complete analysis that reveals how parameters influence accuracy curves such as those of Figure 1. However, this network model involves somewhat arbitrary assumptions on architecture and parameters, and it is not clear how the DDM reduction of Liu et al., with its variable drift rate, relates to the optimal decision theory for the constant drift case (Bogacz et al., 2006). This article addresses this issue by offering analytically tractable approximations to two Bayesian inference models (compatibility bias and spatial uncertainty) proposed in Yu et al. (in press).

Specifically, the joint signal probability distribution of equation 2.4 is approximated as a linear sum, and then, by assuming that the sum of the nonnormalized posteriors remains close to one and taking a continuum limit, we obtain analytical expressions for the mean posterior probabilities. Employing a further approximation in which the net probabilities of having answered correctly or incorrectly at time *t* are computed, we derive semianalytical approximations for accuracy and reaction time distributions. While the latter correspond more closely to an “interrogation protocol” (Bogacz et al., 2006; Liu et al., 2008) in which subjects are cued to respond at specific times, and so differ quantitatively from those computed numerically for free responses (compare Figure 10 with Figure 4), the overall accuracy curves and individual posteriors derived from the continuum model reproduce those of the Bayesian model quite well (see Figures 8 and 9).

We therefore expect that our analytical approximations will be useful in guiding parameter selection when fitting models to experimental data. In section 3, we provide an example of this by deriving simple parametric constraints that must hold to obtain the dip below 50% in the posterior probability for early responses. Moreover, although the coefficients differ, the linearized update rules of both equations 3.3 and 3.9 demonstrate that the flanker inputs *x*_{1} and *x*_{3} work with the target input *x*_{2} for the compatible hypotheses and against it for the incompatible hypotheses. This underlying computational architecture gives rise to the same basic ability of both the compatibility bias and spatial uncertainty models to account for the dynamics of flanker interference in behavioral data. In section 4.5, we show that both the original models and DDM approximations derived from them can be fitted to experimental data, further strengthening our case.

Our analysis also reveals that a particularly simple stochastic differential equation, the constant-drift diffusion (DD) process of equation 4.11, approximately describes the evolution of Bayesian posteriors in log probability space. As described in Bogacz et al. (2006), this is a continuum limit of the sequential probability ratio test (Wald, 1947), which is known to be optimal for identifying noisy signals in two-alternative choice tasks (Wald & Wolfowitz, 1948). Moreover, it has been shown (Bogacz et al., 2006; Liu et al., 2008) that DD and related Ornstein-Uhlenbeck processes emerge naturally in linearized reductions of competing leaky accumulator models (Usher & McClelland, 2001) for 2AFC. In these neural networks, the difference between activities in a pair of units at the output decision or response stage behaves like the accumulating variable *y*(*t*) in equation 4.11 (Gold & Shadlen, 2001).^{1} DD models can also capture bottom-up (stimulus-driven) and top-down influences such as attention and expectation of rewards via variable drift rates (Liu et al., 2008; Eckhoff, Holmes, Law, Connolly, & Gold, 2008).

Since accumulator models may be derived from biophysical models of spiking neurons (Wang 2002; Wong & Wang, 2006), in which their activities represent short-term averages of collective firing rates, this suggests a mechanism by which neural substrates may be able to perform Bayesian computations. Specifically, in reducing the coupled Bayesian inference model, equation 2.9, to a DD process, we see how prior information maps into initial conditions, and evolving posteriors in log probability space are represented by spike rates of groups of neurons. In connection with the latter, we note that Bogacz and Gurney (2007) present computational and experimental evidence that Bayesian computations involving exponentiation and taking logarithms (cf. Yu & Dayan, 2005), as in section 4, can be approximated by neurons in the basal ganglia.

## Appendix: Mathematical and Data Fitting Details

### A.1. Evaluation of Integrals.

*t*), σ(

*t*). Figure 12 indicates the domain of integration: Here we have added subscripts to the time-varying means and standard deviations μ

_{j}(

*t*), σ

_{j}(

*t*), using the same shorthand

*z*=

_{j}*z*

_{1,j}as in section 4.4 to indicate which of the four cases

*s*

_{2}= ±1;

*M*= 1, 2 enumerated in section 4.3 is intended.

### A.2. Data Fitting Method.

*L*

^{2}) distance between vectors and with components

*u*and

_{j}*v*is Vectors describing accuracies and RT histograms were first formed from the data (), and corresponding model predictions () were formed and their differences computed by equation A.6. Since the units of accuracy and RT differ, each of these was then weighted by dividing it by the mean of the data, as indicated by an overbar in equation A.7. This produces the nondimensional quantity: This error term, representing the sum of percentage differences in accuracy and RT, was then minimized. Note that the resulting value depends on the number of RT bins in the data, and so should be normalized with respect to this when comparing fits of data sets with differing numbers of bins.

_{j}## Acknowledgments

This work was supported by PHS grants MH58480 and MH62196 (Cognitive and Neural Mechanisms of Conflict and Control, Silvio M. Conte Center). Y.L. benefited from studentship support from the School of Engineering and Applied Science at Princeton University, and A.Y. received funding from an NIH NRSA institutional training grant. We thank the referees for perceptive and helpful comments.