## Abstract

The Eriksen task is a classical paradigm that explores the effects of competing sensory inputs on response tendencies and the nature of selective attention in controlling these processes. In this task, conflicting flanker stimuli interfere with the processing of a central target, especially on short reaction time trials. This task has been modeled by neural networks and more recently by a normative Bayesian account. Here, we analyze the dynamics of the Bayesian models, which are nonlinear, coupled discrete time dynamical systems, by considering simplified, approximate systems that are linear and decoupled. Analytical solutions of these allow us to describe how posterior probabilities and psychometric functions depend on model parameters. We compare our results with numerical simulations of the original models and derive fits to experimental data, showing that agreements are rather good. We also investigate continuum limits of these simplified dynamical systems and demonstrate that Bayesian updating is closely related to a drift-diffusion process, whose implementation in neural network models has been extensively studied. This provides insight into how neural substrates can implement Bayesian computations.

## 1.  Introduction

The psychological (Laming, 1968; Ratcliff, 1978; Ratcliff, Van Zandt, & McKoon, 1999) and neural bases of decision making (Platt & Glimcher, 2001; Schall, 2001; Gold & Shadlen, 2001) have been widely studied, particularly in constrained situations such as the two-alternative forced-choice (2AFC) task. In 2AFC, subjects are required to discriminate a stimulus and give one of two permissible responses. The sequential probability ratio test (SPRT) is optimal for 2AFC tasks, whether the objective is to minimize the mean reaction time (RT) for a desired accuracy level (Wald & Wolfowitz, 1948) or to minimize a linear cost function in accuracy and detection delay under the Bayesian formulation (Liu & Blostein, 1992). The SPRT compares the relative likelihoods of noisy inputs given two possible hypotheses and reaches a decision when the cumulative evidence for one of them exceeds a fixed threshold. Performance on 2AFC tasks seems broadly consistent with the SPRT (Ratcliff & Smith, 2004), and there is evidence that competing neural populations subserving decision making may implement a strategy close to the SPRT (Gold & Shadlen, 2001, 2002; Schall, 2001; Shadlen & Newsome, 2001; Roitman & Shadlen, 2002; Schall, Stuphorn, & Brown, 2002). Moreover, the continuum limit of SPRT is an analytically tractable drift-diffusion model (DDM) (Holmes et al., 2005), which yields explicit expressions for error rates and reaction times that can be used to investigate reward-rate maximization in 2AFC (Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006), and it has been shown that various neural network models of decision making (Cohen, Dunbar, & McClelland, 1990; Cohen, Servan-Schreiber, & McClelland, 1992; Usher & McClelland, 2001) can be reduced to variants of the DDM (Bogacz et al., 2006).

The Eriksen flanker task (Eriksen & Eriksen, 1974) is an extension of the classical 2AFC task in which the decision is complicated by potentially conflicting distractor inputs. Subjects are required to discriminate a central target stimulus (e.g., the letter H or S) flanked by distractors. Flankers can be either compatible with the central target (e.g., HHHHH) or incompatible (e.g., HHSHH). Subjects display a compatibility effect, being typically slower and less accurate on incompatible than compatible trials (Eriksen & Eriksen, 1974). Furthermore, subjects perform at worse than chance level for short RTs for incompatible trials only. This dip in accuracy implies that flanker interference is particularly potent shortly after stimulus presentation. Figure 1 shows data from two instances of a deadlined Eriksen task. While specific details of reaction time distributions and relationships between accuracy and reaction time differ between the two studies, the basic compatibility effect and the dip in accuracy on incompatible trials are prominent in both.

Figure 1:

Accuracy versus RT in the Eriksen task. Human subjects respond more slowly and less accurately in the incompatible condition. In particular, accuracy is below chance (.50) for short RTs but approaches 1 for longer RTs. (A) Reaction times gauged by electromyographic activities (EMG), adapted from Gratton, Coles, Sirevaag, Eriksen, and Donchin (1988). (B) Behavioral data from Servan-Schreiber, Bruno, Carter, and Cohen (1998). Details differ, but the compatibility effect and dip in accuracy for short-reaction incompatible trials are obvious in both data sets.

Figure 1:

Accuracy versus RT in the Eriksen task. Human subjects respond more slowly and less accurately in the incompatible condition. In particular, accuracy is below chance (.50) for short RTs but approaches 1 for longer RTs. (A) Reaction times gauged by electromyographic activities (EMG), adapted from Gratton, Coles, Sirevaag, Eriksen, and Donchin (1988). (B) Behavioral data from Servan-Schreiber, Bruno, Carter, and Cohen (1998). Details differ, but the compatibility effect and dip in accuracy for short-reaction incompatible trials are obvious in both data sets.

Since the Eriksen task extends the standard 2AFC task, we suspect that optimal policy in this case is similar to the SPRT. In this vein, Yu, Dayan, and Cohen (in press) modeled the computations underlying the Eriksen task as iterative Bayesian updating, with the decision being made (and the trial terminated) when the cumulative posterior for one of the two possible target stimuli exceeds a fixed threshold. It was also proposed that the apparent suboptimality in performance can be explained by either an incorrect prior on the relative frequency of compatible and incompatible trials (compatibility bias model) or by inherent spatial overlap of visual processing neurons (spatial uncertainty model; Yu et al., in press). Here we reduce the Bayesian models to simpler dynamical systems and study them analytically and numerically. While the simpler models closely approximate the original ones in dynamics and perfomance, their analytical tractability yields explicit expressions for the dependence of inferential and psychometric quantities on model parameters. We discuss the relationship between exact Bayesian inference and drift-diffusion processes, emphasizing the link that this establishes between Bayesian updating and the neural substrates that may execute it. Our analysis also reveals the formal similarity of computations underlying the compatibility bias and spatial uncertainty models, which were motivated by disparate experimental literature and were formulated differently within the Bayesian framework.

The article is organized as follows. After reviewing the Bayesian inference models in section 2, in section 3 we derive and analyze the simplified models: uncoupled, linear discrete dynamical systems. From these, we derive explicit criteria on parameters that predict the dip in accuracy for incompatible trials, and we compare accuracies and RT distributions generated by the full and simplified models. In section 4 we show that the DDM is a continuum limit of the simplified models, and from this, we derive analytical predictions for mean posterior probabilities. We also compute accuracy versus time curves and reaction time distributions under an approximation that violates the first passage threshold crossing criterion adopted in Yu et al. (in press) but permits explicit analysis, and we provide direct comparisons between behavioral data and predictions of the full and approximate compatibility bias models. Section 5 contains a summary and discussion.

## 2.  A Bayesian Framework for the Eriksen Task

We briefly review the compatibility bias and spatial uncertainty inference models for the Eriksen task proposed by Yu et al. (in press). The generative process common to both models consists of the prior probability distribution over trial type (M = 1 if compatible, M = 2 if incompatible), and the stochastic relationship between the trial type M and the stimuli X, and between the stimuli and the noisy inputs into the visual system X. For simplicity, it was assumed that there are three stimuli, , for left, center, and right, respectively; and each one of three neural units or populations responds to one stimulus. Here the pairs of left and right flankers are combined in s1 and s3, respectively, and we assume that all three inputs contain independent noise, among the units and populations and over time. Using integers si = ±1 to denote S and H, and M = 1, 2 to denote compatible and incompatible trials, respectively, we may formally describe the process as
2.1
2.2
2.3
2.4
2.5
For the compatibility bias model, the prior probability β for compatible trials is assumed to be higher than the “true” value 0.5, and the inputs are taken to be normally distributed as a function of their respective stimuli and independent of neighboring stimuli:
2.6
that is, at each step t, the xi(t) are independently drawn from normal distributions with means si and standard deviations σ. We denote this procedure below by .
In the spatial uncertainty model, the correct prior β = 0.5 is assumed, but the inputs are corrupted by their neighbors according to
2.7
2.8
where a1, σ1 denote influence from the primary stimulus and a2 and σ2 that from the flankers.
Define for the posterior probabilities and for the likelihood functions, where i ∈ {−1, +1}, j ∈ {1, 2}. Based on Bayes' rule, this yields our inference model: four discrete-time dynamical equations, coupled through normalization,
2.9
with initial conditions
2.10
To make a decision based on the accumulating inputs, we compare the cumulative marginal posterior probability,
2.11
against a decision threshold q, a policy closely related to the SPRT (Wald, 1947). As soon as P(s2 = iXt) exceeds q for i = −1 or i = +1, the system chooses the corresponding response (H or S) and terminates observations for the current trial. The computation for the marginal posterior probability over compatibility is analogous: P(M = jXt) = zt−1,j + zt1,j.

Examples of accuracies and RTs thus predicted are shown in Figure 4. For these and other calculations, unless otherwise noted, we adopt the parameter values used in Yu et al. (in press): σ = 9 for the compatibility bias model and σ1 = 7, σ2 = 5, a1 = 0.7, a2 = 0.3 for the spatial uncertainty model, and q = 0.9 for both.

## 3.  Linearization and Parametric Dependence

Yu et al. (in press) showed that certain choices of parameters allow both the compatibility bias and spatial uncertainty models to capture key properties of the behavioral data in Figure 1 (see Figure 4 below). Here we derive general constraints on the parameters in each model that allow them to reproduce the behavioral data: σ for the compatibility bias model; a1, a2, σ1, and σ2 for the spatial uncertainty model; and n, the number of distractors. While we cannot analyze the complex relationship between accuracy and reaction time directly, we wish to at least constrain parameters so that the mean posterior probability for s2 = 1 (the correct answer) dips below that for s2 = −1 after one or a few time steps of observations. Although the relative probability of a correct response at time t depends not just on the mean but also on higher-order moments, such an analysis would illuminate the magnitude and range of the effective parameters. Unfortunately, even this partial analysis is difficult for the original Bayesian model, as P(s2Xt) involves the summation of two exponential functions of the inputs, as in equation 2.11, and there is no obvious way to derive the expectation of P(s2Xt) as an explicit function of the parameters that specify the generation of the inputs X.

Due to such computational intractability, we instead work with a linearized approximation to the exact posterior update rule of equation 2.9. We will motivate and describe the approximations for the two Bayesian models and demonstrate via simulations that the parametric constraints derived from this approximate scheme provide useful bounds for the original Bayesian models.

### 3.1.  The Compatibility Bias Model.

Given our assumption of independent, normally distributed inputs (see equations 2.4 and 2.6), we have
3.1
where each si can take on the value ±1. We now derive an approximation to equation 3.1 that is linear in the xi(t)'s. With the following definition,
the likelihood function for s2 = 1 and M = 1 (i.e., s1 = s2 = s3 = 1) can be approximated as follows:
3.2
The first step uses the fact that quadratic and constant terms cancel in the ratios. The next two rely on Taylor series expansion of the exponential terms and the binomial series approximation:
The approximation is justified by the fact that xk(t) ∈ [−1 − 2σ, 1 + 2σ] with >99% probability if we can assume that σ ≫ 1. This latter assumption is reasonable since we are modeling the timescale at which, on average, many time steps of inputs are needed to perform the discrimination.
Generalizing the approximation 3.2 to the other three cases and using the four resulting expressions in equation 2.9, we obtain the following approximate update rules:
3.3
in which the denominator,
3.4
is the sum of all four numerators and normalizes the posterior distribution, and the common factors Θ1Θ2Θ3/8 in the numerators and denominator of equation 3.3 have canceled. Initial conditions are as in equation 2.10. Since this simplified system is still nonlinearly coupled through the denominator Dt, we work with the joint probability instead. The two are related as follows:
3.5
The joint probability vti,j obeys the uncoupled update rule:
3.6
where the sign preceding each xi depends on i and j as in equation 3.3. As is apparent in equation 3.5, zijt can be obtained by normalizing vijt on time step t, but vijt cannot be used directly in the perceptual decision process, since a fixed threshold in the posterior probability space has no equivalent fixed value in the joint posterior space. However, vijt is sufficient for deriving bounds on the generative parameters that on average make the posterior probability for s2 = 1 dip below that for s2 = −1 after one time step, when the inputs are generated from the incompatible stimulus array: X = (−1, 1, − 1) (the analysis for X = (1, − 1, 1) is analogous). Specifically, since P(s2, MXt) = p(s2, M, Xt)/p(Xt), the condition 〈z11, 1 + z11,2〉 < 〈z1−1, 1 + z1−1, 2〉 is equivalent to 〈v11, 1〉 + 〈v11, 2〉 < 〈v1−1, 1〉 + 〈v1−1, 2〉. We therefore require
3.7
since the mean values of x1, x2, and x3 are −1, 1, and −1, respectively, and the compatible terms are weighted by the compatibility prior bias β (and the incompatible ones weighted by 1 − β).

Hence, β > 3/4 is the necessary and sufficient condition for the average posterior probability for s2 = 1 to dip below that for s2 = −1 after one observation, when the true stimuli are the incompatible array (−1, 1, −1). More generally, we can show that the constraint is β > (n + 1)/(2n), where n is the total number of flankers. This makes intuitive sense, for it suggests that the dip is more prominent or more likely to happen when the subject's prior compatibility bias is stronger or the number of flankers is larger. Indeed, there are behavioral data suggesting that flanker interference effects are stronger when there is a lower frequency of incompatible trials (Gratton, Coles, & Donchin, 1992).

These analytical constraints guarantee a dip only in the posterior probability. As shown in Figure 2 (left), for a particular set of model parameters, the mean accuracy in compatible trials terminating within 20 time steps steadily decreases as a function of β, and it drops below .5, indicating the presence of a dip, for all values of β > 0.82: somewhat higher than β = 0.75, the lower bound of inequality (see equation 3.7) that results in a dip in posterior probability.

Figure 2:

Simulated and analytical approximations of parameter values that produce dips in accuracy versus reaction time for incompatible trials. Graphs show accuracy averaged over trials with simulated reaction times under 20 time steps, as a function of β for the compatibility bias model (left) and the ratio of means a1/a2 for the spatial uncertainly model (right). Crossings with the 0.5 accuracy line indicate numerically obtained estimates of the “true” parameter constraints; dashed lines show the approximate constraints of equations 3.7 and 3.13.

Figure 2:

Simulated and analytical approximations of parameter values that produce dips in accuracy versus reaction time for incompatible trials. Graphs show accuracy averaged over trials with simulated reaction times under 20 time steps, as a function of β for the compatibility bias model (left) and the ratio of means a1/a2 for the spatial uncertainly model (right). Crossings with the 0.5 accuracy line indicate numerically obtained estimates of the “true” parameter constraints; dashed lines show the approximate constraints of equations 3.7 and 3.13.

A major factor underlying the discrepancy between the two constraints is that we considered only the mean of the posterior probability, not the full distribution. The mean accuracy depends not only on the mean posterior value but also on higher moments. If the distribution were symmetric about its mean, the dip in the mean posterior would directly translate into a dip in accuracy, but as we will show in section 4, the distribution of the posterior trajectories is strongly skewed, and the interaction of that skewness with the decision threshold also plays a role in determining the presence of the dip in accuracy versus reaction time.

A second reason for the discrepancy is that the theoretical bounds are for the dip to occur in the posterior after one time step, whereas in the numerical simlations, due to the infrequency of responses at very short RTs, all trials that terminate within the first 20 time steps were used to estimate accuracy. If the temporal extent of the dip in the posterior distribution is very small (which is likely in boundary cases), then conditional accuracy may not fall below 0.5 when averaged over 20 time steps. The numerically obtained constraints are therefore likely to be more conservative than the analytical approximations.

### 3.2.  The Spatial Uncertainty Model.

Derivation of iterated maps for the spatial uncertainty model are more tedious than those of equation 3.3 due to the extra “cross-talk” links in the generative model, but they follow from similar reasoning. Defining , forming the triple product, and dividing through by
3.8
we obtain the approximate update rule:
3.9
where Dt is again the sum of the numerators and the parameters Ai are
3.10
Since the prior distribution is uniform, the initial conditions for equation 3.9 are
3.11
Again, the constraint,
3.12
is satisfied if A4(a1 − 2a2) < 2 A3(a1a2), or, equivalently, if the ratio of means, a1/a2, lies in the interval
3.13
where is the ratio of the variances. Intuitively, if the ratio a1/a2 is too large, the flankers have negligible effects; if it is too small, the inputs lose their spatial selectivity altogether. More generally, if there are n flankers, the range is
We now compare these constraints with numerical simulations of the full inference model for the specific noise parameters (σ1 = 7, σ2 = 5). We simulated the full model using a range of values of a1 and a2 (with their sum held at 1) and obtained accuracy of all responses falling within the first 20 time steps as a function of a1/a2. As can be seen in Figure 2 (right), the accuracy in this short-RT bin is less than .5 when a1/a2 falls within (0.70, 3.55), a somewhat more stringent condition than the analytically derived (approximate) interval (0.67, 3.98).

### 3.3.  Evaluating the Cost of Linearization.

Direct simulations of the linear approximation can be compared with those of the original inference model. Figure 3 shows the results for the compatibility bias model for a particular setting of parameters (σ = 9), comparing the full inference model with the simplified iteration of equation 3.6. The same sequence of noisy observations xi(t) was used for both processes, and in computing the value of for the latter at each time step t, normalization was applied only at that step. The agreement is remarkably good, validating our linear approximations to the products of probabilities (5-4) developed in section 3. The quality of the linear approximation for the spatial uncertainty model is similarly good (details not shown).

Figure 3:

Posterior probability for one sample path of the approximate compatibility bias model (see equation 3.6; dashed line), compared with a sample path from the original inference model (see equation 2.9; solid line). The same sequence of observations was used in both cases.

Figure 3:

Posterior probability for one sample path of the approximate compatibility bias model (see equation 3.6; dashed line), compared with a sample path from the original inference model (see equation 2.9; solid line). The same sequence of observations was used in both cases.

We can also simulate perceptual discrimination based on the linearized evidence accumulation process, using the first passage criterion for threshold crossing appropriate for free response conditions. As in Yu et al. (in press), we adopt the decision threshold q = 0.9 for both the compatibility bias and the spatial uncertainty model. The time span, taken here as 200 steps, is divided into 10 bins, and sample paths for the full model, equation 2.9, and the approximate decoupled system, equation 3.6, and its analogue for spatial uncertainty are computed. The decoupled results are then normalized by dividing by the sum ∑i, jvi, jt at each t in the current bin (normalization is not applied for steps 1 through t − 1). The same (unit) step size is used in all cases. Responses are logged when the first of the probabilities P(s2 = +1 ∣ Xt) = P(s2 = +1, M = 1 ∣ Xt) + P(s2 = +1, M = 2 ∣ Xt) or P(s2 = −1 ∣ Xt) = P(s2 = −1, M = 1 ∣ Xt) + P(s2 = −1, M = 2 ∣ Xt) crosses q. After collecting sufficiently many paths (2000 in this case), response time histograms are formed and the fraction of correct responses in each bin summed to yield accuracy versus time curves.

Figure 4 illustrates the results of such simulations for the compatibility and spatial uncertainty models. Accuracy versus reaction time, and empirical distributions of reaction time, are shown for both the full and approximate models. The approximate systems reproduce the characteristic dip in accuracy for fast, incompatible trials for both models, and the accuracy curves and reaction time distributions predicted by the approximate theory agree well with those of the full inference models. Note that the use of the first passage criterion for response produces reaction time distributions that agree with the exact model in details of their shapes: a rise at short reactions times to a peak, followed by a long tail. The distributions for incompatible trials are also flatter and shifted rightward compared to those for compatible trials, as in the data of Figure 1.

Figure 4:

(Top) Accuracy and reaction time distributions for the compatibility bias model for compatible stimuli (left) and incompatible stimuli (right). Solid and right-hand (black) bar of each RT bin pair from full inference model of Yu et al. (in press); dashed and left-hand (gray-shaded) bars from approximate linearized likelihood model. (Bottom) Accuracy and reaction time distributions for the spatial uncertainty model. Results obtained by averaging over 2000 simulated trials in each case.

Figure 4:

(Top) Accuracy and reaction time distributions for the compatibility bias model for compatible stimuli (left) and incompatible stimuli (right). Solid and right-hand (black) bar of each RT bin pair from full inference model of Yu et al. (in press); dashed and left-hand (gray-shaded) bars from approximate linearized likelihood model. (Bottom) Accuracy and reaction time distributions for the spatial uncertainty model. Results obtained by averaging over 2000 simulated trials in each case.

## 4.  A Continuum Limit

The key difficulty in working with the discrete dynamical systems 3.3 and 3.9 lies in the nonlinear coupling of the posteriors zi, jt through the denominators Dt and Dt. It can be proved that individual sample paths generated with the same noise inputs are identical whether computed by iteration of equations 3.3 and 3.9 or by the analogous uncoupled systems, equation 3.6, with posteriors normalized only at the last time step (cf. equation 3.5). (In computing the values for the approximate model 3.6 at each step t for Figure 3, normalization was applied only at that step, but not at steps 1 through t − 1, while the full iteration, equation 2.9, is normalized at every step.) However, it does not follow that we may average over many realizations of the unnormalized process and then normalize (as discussed further in Section 4.3), since these operations do not commute. Nonetheless, we can decouple the dynamics by replacing the normalization constant Dt at each time step with its expectation 〈Dt〉, which does not depend on the inputs, and by replacing that in turn by a constant. We then take continuum limits of the resulting decoupled linear systems to form stochastic differential equations (SDEs), allowing us to use simple analytical results to compute properties of interest. As described in section 5, these SDEs may be related to neurally based models of evidence accumulation.

### 4.1.  Approximating the Denominators.

We first examine the denominator 〈Dt〉 for the compatibility bias model:
where the approximation comes from assuming that the input-dependent terms (functions of xk(t)) are independent from the zij terms, which depend on the previous inputs xk(1), …, xk(t). Although the inputs are conditionally independent (cf. equation 2.5), they are marginally dependent. That is, if previous inputs favored a particular setting of s2 and M, the current one also tends to do the same. For analytical simplicity, we ignore this statistical dependence. Note that in the limit as t → ∞, one of the zi, jt's (corresponding to the actual stimulus setting) converges to 1 (and the others to 0), and that no matter which zi, jt it is,
4.1
More generally, we expect 〈Dt〉 to increase from 1 (D0 is just the sum of the priors) to , where μ denotes the mean value of the xj's. Figure 5 shows exactly this for both compatible and incompatible stimuli for a particular setting of the model parameters, where s2 = 1 and we have averaged over 105 trials. Convergence is slower for incompatible stimuli, since the compatibility prior takes time to update from its initial value P(M) = 0.9.
Figure 5:

Mean values of the denominator 〈Dt〉 for compatible (solid) and incompatible (dashed) stimuli, each averaged over 105 trials. In both cases, the 〈Dt〉 rises monotonically toward its upper bound

Figure 5:

Mean values of the denominator 〈Dt〉 for compatible (solid) and incompatible (dashed) stimuli, each averaged over 105 trials. In both cases, the 〈Dt〉 rises monotonically toward its upper bound

Based on these arguments, and in spite of the fact that Dt can exhibit large variance on individual trials, we assume Dt ≈ 〈Dt〉 ≈ 1, and approximate the dynamics of equation 3.3 by the following linear, decoupled system:
4.2
with initial conditions
4.3
Similiar reasoning can be used to derive a linear, decoupled approximation for equation 3.9 for the spatial uncertainty model. The approximate dynamics for both models can be written as an iterated linear mapping in the following form,
4.4
where the random variables η(t) are drawn from a standard normal distribution, and ai, j and bi, j are constant parameters whose values depend on the model, the probability being computed, and the compatibility condition of the given trial.
For the compatibility bias model, from the details presented in section 3.1, if the current stimulus array is compatible and s2 = 1, we have
4.5
and if is incompatible and s2 = 1, we have
4.6
For s2 = −1, all the signs in ai, j above are reversed.
For the spatial uncertainty model with compatible stimulus array and s2 = 1, the calculations of section 3.2 imply:
4.7
and for an incompatible stimulus array and s2 = 1:
4.8
In both cases, the standard deviation of the noise is given by
4.9
Figure 6 illustrates normal distributions from which these multiplicative terms in equation 4.4 are drawn.
Figure 6:

Typical distributions from which the multiplicative factors ai, j + bi, j η(t) in equation 4.4 are drawn on each time step. Parameter values are σ = 1.8 (top) and a1 = 0.7, a2 = 0.3, σ1 = 1.4, σ2 = 1 (bottom). For illustrative purposes, standard deviations σ, σ1, σ2 are 20% of those used in the text to reduce overlap of distributions.

Figure 6:

Typical distributions from which the multiplicative factors ai, j + bi, j η(t) in equation 4.4 are drawn on each time step. Parameter values are σ = 1.8 (top) and a1 = 0.7, a2 = 0.3, σ1 = 1.4, σ2 = 1 (bottom). For illustrative purposes, standard deviations σ, σ1, σ2 are 20% of those used in the text to reduce overlap of distributions.

### 4.2.  Taking the Continuum Limit.

We now take continuum limits of the discrete dynamical systems derived above that will allow us to compute properties of interest analytically. First, consider the following finite-difference limit of the iterated mapping, equation 4.4,
4.10
where the zi, jt represent the four posteriors . For finite but small δt = 1/k, this represents a finer-grained discretization in which k steps are taken for every one step of equation 4.4, the deterministic increments being of order δt and the random ones of order (Higham, 2001). Taking the limit δt → 0 in equation 4.10, letting yi, j = log(zi, j), and appealing to the Ito formula (Oksendal, 2002, sec. 4.1), we obtain independent, uncoupled SDEs for yi, j(t):
4.11
with constant coefficients and Bi, j = bi, j, whose values are specified in section 4.1. Since each zi, j(t) represents a posterior probability, it should take values in the interval [0, 1], so we shall be interested in sample paths yi, j(t) that start at yi, j(0) < 0 and satisfy −∞ < yi, j(t) ≤ 0.

### 4.3.  Analytical Approximations for the Mean Posteriors.

The SDE, equation 4.11, describes a drift-diffusion process with constant signal and noise level, which has been extensively studied (e.g., Gardiner, 1985; Oksendal, 2002). In particular, for solutions (sample paths) started at y(0) = μ0 and t = 0, the probability density function of y at time t is the following gaussian distribution:
4.12
where
4.13
(Here and below, we drop the subscripts {i, j} in y and z in the understanding that the appropriate coefficients will be used in the final formulas.) We now transform back into z-space, using y = log(z) and to obtain the density:
4.14
The inverse transformation z = exp(y) takes the gaussian distribution over y into a function skewed toward z = 1, as illustrated in Figure 7.
Figure 7:

Probability density functions in logarithmic y-space and the original z-space.

Figure 7:

Probability density functions in logarithmic y-space and the original z-space.

The gaussian distribution over y takes positive values on y > 0 for all t > 0. This presents a problem, since z = exp(y) > 1 for y > 0, contrary to z's designation as a probability measure. Therefore, when computing expected values of , which requires integration of the quantity z p(z, t), we replace all values of z > 1 by z = 1 (or values of y > 0 by y = 0 in the equivalent integral over y). However, to retain analytical tractability, we continue to assume a gaussian distribution over y at time t when generating the distribution at time t + 1—that is, we replace the inappropriate values of y (or x) only in the integral, not in the underlying drift-diffusion process. The expected (mean) value of z is therefore approximated as
4.15
which may be evaluated as explained in appendix A to yield
4.16
Substituting values appropriate for the compatibility bias model from equations 4.5 and 4.6 for the parameters ai, j and bi, j, and hence for Ai, j, Bi, j, and via equations 4.13, for μ(t) and σ(t), we obtain estimates for the four mean posterior probabilities at time t:
4.17
where D(t) is the sum of all four probabilities that normalizes the expressions, and for compatible stimuli the functions μ(t) and σ(t) are
4.18
and for incompatible stimuli,
4.19
Here, we also use the fact that all sample paths start with the initial conditions specified in equation 2.10 and that μ0 = μ(0) = log(z(0)).

As noted at the beginning of this section, normalization and averaging do not commute. This may be understood in terms of the distributions of Figure 7 as follows. While each sample path can be computed for the uncoupled processes and normalized at time t to yield the same result as a sample path of the coupled system (cf. Figure 3), different normalization factors must typically be applied to the values of different paths zi, j(t) at each time t. This would distort the distributions p(zi, j, t), thereby changing their means. However, we may appeal to the observation that the expected value of the denominator remains close to 1 (cf. Figure 5) to conclude that this distortion is likely to be small, and proceed by dividing by the sums of the four mean probability trajectory values at time t to normalize the resulting expressions.

Typical results for mean posterior probabilities are shown in Figure 8. The approximate predictions developed above are shown as dashed curves and the results of averaging over 5000 simulated trials of the full inference model, equation 2.9, are shown solid; compatible and incompatible trials are shown in dark and light, respectively. As above, we compute 200 steps for the discrete iteration of the full system, and we evaluate the corresponding quantities for t ∈ [0, 200] time units from the formulas above. For P(M) = 0.5 (not shown), joint posteriors for correct responses increase similarly for both compatible and incompatible cases, but P(M) = 0.9 elicts markedly different behaviors (top left). The compatibility posteriors show a general rise for compatible stimuli and a monotonic fall for incompatible stimuli, but the posterior probability shows a significant dip below 0.5 at early times for incompatible stimuli, while it rises monotonically for compatible stimuli. As discussed in section 5, the resulting accuracies exhibit similar patterns to the experimental data, with the incompatible case showing a dip in accuracy for early responses. Evolutions of the four individual posterior probabilities are shown in the lower panels of Figure 8.

Figure 8:

Predictions of the full and simplified compatibility bias models in the case that the central symbol is S (s2 = 1) and with prior compatibility bias P(M) = 0.9. (Top left) Marginal mean posterior probabilities P(s2 = 1 ∣ M) (correct response) for compatible and incompatible conditions. (Top right) Marginal mean posterior P(M = 1) for compatibility. (Bottom row) Individual mean posteriors for compatible (left) and incompatible (right) trials. Results from full inference model, averaged as in Figure 4, shown solid and predictions of the continuum approximation (see equations 4.17 to 4.19) shown dashed. Keys identify individual curves.

Figure 8:

Predictions of the full and simplified compatibility bias models in the case that the central symbol is S (s2 = 1) and with prior compatibility bias P(M) = 0.9. (Top left) Marginal mean posterior probabilities P(s2 = 1 ∣ M) (correct response) for compatible and incompatible conditions. (Top right) Marginal mean posterior P(M = 1) for compatibility. (Bottom row) Individual mean posteriors for compatible (left) and incompatible (right) trials. Results from full inference model, averaged as in Figure 4, shown solid and predictions of the continuum approximation (see equations 4.17 to 4.19) shown dashed. Keys identify individual curves.

Figure 8 illustrates that while the approximations developed here do not capture all the detailed behavior of the full model, they do provide reasonably good approximations to the average evolutions of the posteriors over the course of a trial. Timescales are slightly misestimated, and the compatibility posterior (top right) fails to reproduce the slight dip below 0.9 that occurs for compatible trials at early times, but the relative orderings of all the posteriors are correctly predicted. Overall, absolute errors in mean posteriors, computed as described at the end of this section, lie between 0.002 and 0.05, the largest being for in the case of incompatible stimuli (top right, lower curves).

Predictions for the spatial uncertainty model follow from formula 4.17 in a similar manner, upon the substitution of values for a and b from equations 4.7 to 4.9, and using the initial conditions μ0 = log(1/4) for all four posteriors (see equation 3.11). For compatible stimuli, the function μ(t) is
4.20
for incompatible stimuli,
4.21
and in both cases,
4.22
The above results, presented in Figure 9, are not as good as those for the compatibility bias model. Nonetheless, the approximate model captures the key features of the evolving posteriors in the full model rather well, predicting the relative ordering of the posteriors appropriately in all cases except the incorrect choices P(HHH) and P(SHS) for incompatible stimuli; in that case, the approximation for P(SHS) diverges from the correct function, increasing rather than decreasing as t increases (lower right panel), for an absolute error of 0.12. Apart from this case, however, errors lie between 0.015 and 0.08.
Figure 9:

Predictions of the full and simplified spatial uncertainty models. (Top left) Marginal mean posterior probabilities P(s2 = 1 ∣ M) (correct response) for compatible and incompatible conditions. (Top right) Marginal mean posterior P(M = 1) for compatibility. (Bottom row) Individual mean posteriors for compatible (left) and incompatible (right) trials. Results from full inference model, averaged as in Figure 4, shown solid and predictions of the continuum approximation (see equation 4.17) and (see equations 4.20 to 4.22) shown dashed. Keys identify individual curves.

Figure 9:

Predictions of the full and simplified spatial uncertainty models. (Top left) Marginal mean posterior probabilities P(s2 = 1 ∣ M) (correct response) for compatible and incompatible conditions. (Top right) Marginal mean posterior P(M = 1) for compatibility. (Bottom row) Individual mean posteriors for compatible (left) and incompatible (right) trials. Results from full inference model, averaged as in Figure 4, shown solid and predictions of the continuum approximation (see equation 4.17) and (see equations 4.20 to 4.22) shown dashed. Keys identify individual curves.

The errors for both models were computed for each mean posterior using the L1 norm as follows,
4.23
where pt and denote the posteriors predicted by the full and simplified models, respectively.

### 4.4.  Making Use of Explicit Mean Posteriors.

In addition to providing explicit expressions for posterior probabilities, the continuum limit also yields approximations for accuracy and reaction time distributions. To estimate accuracy as a function of response time under the free response protocol assumed by Yu et al. (in press), we compute the fraction of mass of the evolving probability density p(zti,1, zti,2) that exceeds a given threshold zti,1 + zti,2 = q at each time t (recall equation 2.11). This procedure overestimates first passage times, since some of the sample paths that lie beyond the threshold q at time t may have crossed at earlier times, but it permits some analytical simplification. Without loss of generality, we shall assume that s2 = 1.

The integral that we need to evaluate is
4.24
where we have used the shorthand notation p(zj, t) = p(zt1,j), and the approximation comes from assuming p(zt1,1, zt1,2) ≈ p(zt1,1)p(zt1,2) for the uncoupled and linearized approximate dynamical system. This assumption greatly simplifies the computations, although the uncoupled processes are not entirely independent since they are activated by common inputs (x1, x2, x3), albeit in different linear combinations. We also note that the variables zj should be nonnegative (cf. Figure 7). The domain of integration is pictured in Figure 12. The p(zj, t)'s take the forms derived in section 4.3, and since each is a normalized gaussian in the logarithmic y variables, the integral of their product over the entire positive quadrant is 1. Hence, we have
4.25
which is evaluated in appendix A to yield
4.26
where
4.27
Unfortunately, the final integral in equation 4.26 cannot be computed analytically, but it can be evaluated accurately and rapidly by numerical methods.
Response accuracy is approximated by the fraction of correct responses that exceed threshold:
4.28
where the denominator approximates the sum of all four probabilities zt1,1 + zt1,2 + zt2,1 + zt2,2. (The term P(s2 = 2 ∣ Xt)est is computed in a similar manner to equation 4.26, with the appropriate expressions for μ(t), σ(t) from section 4.3.) The denominator is the cumulative reaction time, and so its derivative with respect to t provides the reaction time distribution. Hence, both accuracy and reaction time distributions can be approximated semianalytically. Figure 10 shows the resulting approximations to the mean posteriors for the compatibility bias model for a particular setting of model parameters. The dip in accuracy for incompatible trials is reproduced, and after an initial rise in accuracy for compatible trials, accuracy slowly declines.
Figure 10:

Predictions of accuracy (left) and reaction time histograms (right) computed under the approximation of section 4.4. Solid curve and dark bars indicate compatible trials; dashed curve and light bars indicate incompatible trials.

Figure 10:

Predictions of accuracy (left) and reaction time histograms (right) computed under the approximation of section 4.4. Solid curve and dark bars indicate compatible trials; dashed curve and light bars indicate incompatible trials.

As we have noted, sample paths of the SDE, equation 4.11, may pass across q and back, possibly repeatedly, in the interval (0, t), so these results do not directly correspond to the first-passage decision policy of the Bayesian models in Yu, Dayan, and Cohen (in press). This accounts for differences between the accuracy curves and reaction time distributions of Figure 1 and the free response results of section 3.3. For example, the compatibility bias-free response data of Figure 4 do not show the mild decline in accuracy for later compatible trials of Figure 10, although the spatial uncertainty simulations of Figure 4 do show such a decline. Nonetheless, the qualitative agreement between Figures 10 and 4 is quite good, and since the semiexplicit expression of equations 4.26 and 4.27 replaces the lengthy Monte Carlo simulations of section 3.3, it may be helpful in guiding parameter fits to data.

The posterior probability expressions can also be used to constrain parameter choices by requiring the derivative of at time t = 0 to be negative and finding corresponding conditions on the parameters. The results of this computation (details not shown) agree closely with those in section 3.

### 4.5.  Fitting the Models to Data.

We now briefly describe the results of fitting the full models of section 2 and the reduced DD processes of sections 4.2 to 4.4 to the data of Servan-Schreiber et al. (1998), reproduced in Figure 1B. For the compatibility bias model, the parameters fitted are the noise level σ, prior β, threshold q, and step durations δt (for DDM) and δt (for the full model), which determine the overall timescale. For spatial uncertainty, they are σ1, σ2, a1, q, and δt, δt (as in section 3.2, we set a2 = 1 − a1). To these we add one further parameter, T0, to account for time occupied by sensory decoding and motor response mechanisms, which superimposes a rightward shift on the RT distributions. (Such an “overhead time” might approximate the mean RT on a simple target detection task.)

We employ the same weighted Euclidean error norm as in Liu, Holmes, and Cohen (2008); see appendix A.2, below, for details. The parameter values obtained are as follows: compatibility bias: σ = 6.5, β = 0.87, q = 0.98, δt = 0.95 ms, ΔT = 1.04 ms, and T0 = 90 ms; spatial uncertainty: σ1 = 6.9, σ2 = 5.1, a1 = 0.71, q = 0.92, δt = 3.4 ms, ΔT = 0.33 ms, and T0 = 95 ms. Note that the noise levels are consistent with the assumptions of sections 3, 4.1, and 4.2: for example, 1/σ4 ≪ 1/σ2 (cf. equation 3.2). The fitting errors are as follows: compatibility bias: full model 2.5; DDM 2.3; spatial uncertainty: full model 2.1; DDM 1.8. In fitting, we excluded data points in the first (0–100 ms) and the last (900–1000 ms) of the 10 RT bins, since no accuracy data are available for the former, and all trials in which responses exceeded 1000 ms were placed in the latter (note the uptick in RT distributions at the right-most data point). However, we computed model data in that bin and in the next one (1000–1100 ms). Since our fitted values of the overhead time T0 push even the shortest model RTs beyond 100 ms, accuracies cannot be computed for the 0–100 ms bin unless we assume some premature responses that are initiated before stimulus onset. For such premature responses, the equal prevalence of H and S in the experiments ensures that accuracy approaches chance at very short decision times (cf. the upper left panels of Figures 8 and 9). Indeed, this chance performance is unavoidable, independent of the inference or decision strategy, since the response is deprived of stimulus information and cannot possibly correlate with stimulus identity.

These results are shown in Figure 11. Fit qualities are slightly better for the spatial uncertainty model, and in both cases, perhaps surprisingly, fit errors are slightly smaller for the reduced DDM than for the full Bayesian procedure. The fit errors are similar to that of 2.4 obtained in Liu et al. (2008) for the Gratton et al. (1988) data (see Figure 1A), using a DDM with variable drift rates derived from the neural network model of Cohen et al. (1992). That model contains eight free parameters, compared with five and six, respectively, in the present cases. Indeed, in Liu et al. (2008), six parameters are required to describe drift rates in the compatible and incompatible cases, modeling progressive increase in attention to the central stimulus, and these cases are fitted separately. In this study, compatible and incompatible trials are fitted simultaneously, and a single parameter in each model (the compatability prior β, or the weight a1), along with Bayesian updating, serves to describe the accumulation of evidence.

Figure 11:

Accuracy (upper curves in each panel) and reaction time distributions (lower curves) from the full (squares) and reduced DD (triangles) models for compatible (left) and incompatible (right) trials. Upper panels show compatibility bias and lower panels spatial uncertainty model results, respectively. Parameters were fitted to the data of Servan-Schreiber et al. (1998) (dashed curves with circles; cf. Figure 1B).

Figure 11:

Accuracy (upper curves in each panel) and reaction time distributions (lower curves) from the full (squares) and reduced DD (triangles) models for compatible (left) and incompatible (right) trials. Upper panels show compatibility bias and lower panels spatial uncertainty model results, respectively. Parameters were fitted to the data of Servan-Schreiber et al. (1998) (dashed curves with circles; cf. Figure 1B).

Both models underestimate mean RTs for compatible trials, producing an excess of points in the 200–250 ms RT bin. They are also unable to capture the drop in accuracy at the shortest RTs on compatible trials (left panels) due to the T0 behavior noted above. They do reproduce this drop on incompatible trials, although the full compatibility bias model does not exhibit the dip below 50%. The spatial uncertainty model is substantially better in this regard (lower right panel), although it underestimates accuracy in the 400–900 ms part of the RT range for both the compatible and incompatible cases. In preliminary work, we also tried a modified norm that preferentially weights low RT data: this slightly improved fits of RT distributions but did not affect compatible accuracy fits. We also fitted the full and DD models to the data of Gratton et al. (1988; see Figure 1A) obtaining similar fit qualities, although the failure to capture the steady rise from 50% accuracy at low RTs for compatible trials was more striking in that case (model results not shown here).

We note that individual subjects exhibit large differences in signal-to-noise ratios and thresholds (in DDM fits; cf. Ratcliff et al., 1999; Bogacz, Hu, Cohen, & Holmes 2007), and that here we have averaged over all subjects to produce single sets of fit parameters for each model. As illustrated in Figure 1, there is also substantial variability in Eriksen data, perhaps due to differing deadlining protocols. (Deadlines are necessary to produce enough short reaction times and hence obtain a significant dip in accuracy on incompatible trials.) The resulting variability in motor preparation times can affect reaction times, and no allowance for this is made in the inference model, which describes only cognitive processing. Our additional parameter T0 only partially accounts for this, and in this case, it deprives us of accuracy data in the smallest RT bin.

## 5.  Discussion and Conclusions

In Liu et al. (2008), a neural network model of the Eriksen task (Cohen et al., 1992; Servan-Schreiber et al., 1998) was linearized and reduced to a DDM with time-varying drift, allowing relatively complete analysis that reveals how parameters influence accuracy curves such as those of Figure 1. However, this network model involves somewhat arbitrary assumptions on architecture and parameters, and it is not clear how the DDM reduction of Liu et al., with its variable drift rate, relates to the optimal decision theory for the constant drift case (Bogacz et al., 2006). This article addresses this issue by offering analytically tractable approximations to two Bayesian inference models (compatibility bias and spatial uncertainty) proposed in Yu et al. (in press).

Specifically, the joint signal probability distribution of equation 2.4 is approximated as a linear sum, and then, by assuming that the sum of the nonnormalized posteriors remains close to one and taking a continuum limit, we obtain analytical expressions for the mean posterior probabilities. Employing a further approximation in which the net probabilities of having answered correctly or incorrectly at time t are computed, we derive semianalytical approximations for accuracy and reaction time distributions. While the latter correspond more closely to an “interrogation protocol” (Bogacz et al., 2006; Liu et al., 2008) in which subjects are cued to respond at specific times, and so differ quantitatively from those computed numerically for free responses (compare Figure 10 with Figure 4), the overall accuracy curves and individual posteriors derived from the continuum model reproduce those of the Bayesian model quite well (see Figures 8 and 9).

We therefore expect that our analytical approximations will be useful in guiding parameter selection when fitting models to experimental data. In section 3, we provide an example of this by deriving simple parametric constraints that must hold to obtain the dip below 50% in the posterior probability for early responses. Moreover, although the coefficients differ, the linearized update rules of both equations 3.3 and 3.9 demonstrate that the flanker inputs x1 and x3 work with the target input x2 for the compatible hypotheses and against it for the incompatible hypotheses. This underlying computational architecture gives rise to the same basic ability of both the compatibility bias and spatial uncertainty models to account for the dynamics of flanker interference in behavioral data. In section 4.5, we show that both the original models and DDM approximations derived from them can be fitted to experimental data, further strengthening our case.

Our analysis also reveals that a particularly simple stochastic differential equation, the constant-drift diffusion (DD) process of equation 4.11, approximately describes the evolution of Bayesian posteriors in log probability space. As described in Bogacz et al. (2006), this is a continuum limit of the sequential probability ratio test (Wald, 1947), which is known to be optimal for identifying noisy signals in two-alternative choice tasks (Wald & Wolfowitz, 1948). Moreover, it has been shown (Bogacz et al., 2006; Liu et al., 2008) that DD and related Ornstein-Uhlenbeck processes emerge naturally in linearized reductions of competing leaky accumulator models (Usher & McClelland, 2001) for 2AFC. In these neural networks, the difference between activities in a pair of units at the output decision or response stage behaves like the accumulating variable y(t) in equation 4.11 (Gold & Shadlen, 2001).1 DD models can also capture bottom-up (stimulus-driven) and top-down influences such as attention and expectation of rewards via variable drift rates (Liu et al., 2008; Eckhoff, Holmes, Law, Connolly, & Gold, 2008).

Since accumulator models may be derived from biophysical models of spiking neurons (Wang 2002; Wong & Wang, 2006), in which their activities represent short-term averages of collective firing rates, this suggests a mechanism by which neural substrates may be able to perform Bayesian computations. Specifically, in reducing the coupled Bayesian inference model, equation 2.9, to a DD process, we see how prior information maps into initial conditions, and evolving posteriors in log probability space are represented by spike rates of groups of neurons. In connection with the latter, we note that Bogacz and Gurney (2007) present computational and experimental evidence that Bayesian computations involving exponentiation and taking logarithms (cf. Yu & Dayan, 2005), as in section 4, can be approximated by neurons in the basal ganglia.

## Appendix: Mathematical and Data Fitting Details

### A.1.  Evaluation of Integrals.

To evaluate the integrals of equation 4.15, we employ the change of variables,
A.1
so that , and the integrals become
A.2
The second expression is a standard error function integral, and the first may be put into the same form by completing the square in the argument of the exponent:
A.3
followed by a further change of variables,
A.4
This process results in the expressions of equation 4.16.
To evaluate the integral of equation 4.25, we proceed as follows, dropping the explicit reference to time dependence, which enters the expressions through the mean and standard deviations μ(t), σ(t). Figure 12 indicates the domain of integration:
A.5
Here we have added subscripts to the time-varying means and standard deviations μj(t), σj(t), using the same shorthand zj = z1,j as in section 4.4 to indicate which of the four cases s2 = ±1; M = 1, 2 enumerated in section 4.3 is intended.
Figure 12:

The integral of the joint posterior probability distribution is taken over the positive (z1, z2)-quadrant less the shaded triangular region.

Figure 12:

The integral of the joint posterior probability distribution is taken over the positive (z1, z2)-quadrant less the shaded triangular region.

### A.2.  Data Fitting Method.

Data fits were performed using the fmincon() function in Matlab. Parameters were determined by adjusting them while seeking minima of an error function, described by a weighted Euclidean norm, which averages over accuracy and RT data for both compatible and incompatible trials. The usual Euclidean (L2) distance between vectors and with components uj and vj is
A.6
Vectors describing accuracies and RT histograms were first formed from the data (), and corresponding model predictions () were formed and their differences computed by equation A.6. Since the units of accuracy and RT differ, each of these was then weighted by dividing it by the mean of the data, as indicated by an overbar in equation A.7. This produces the nondimensional quantity:
A.7
This error term, representing the sum of percentage differences in accuracy and RT, was then minimized. Note that the resulting value depends on the number of RT bins in the data, and so should be normalized with respect to this when comparing fits of data sets with differing numbers of bins.

## Acknowledgments

This work was supported by PHS grants MH58480 and MH62196 (Cognitive and Neural Mechanisms of Conflict and Control, Silvio M. Conte Center). Y.L. benefited from studentship support from the School of Engineering and Applied Science at Princeton University, and A.Y. received funding from an NIH NRSA institutional training grant. We thank the referees for perceptive and helpful comments.

## Notes

1

In N-alternative choice models, linear combinations of variables approximate (N − 1)-dimensional DD processes (Usher & McClelland 2001; McMillen & Holmes 2006).

## References

Bogacz
,
R.
,
Brown
,
E.
,
Moehlis
,
J.
,
Holmes
,
P.
, &
Cohen
,
J.
(
2006
).
The physics of optimal decision making: A formal analysis of models of performance in two alternative forced choice tasks
.
Psychological Review
,
113
(
4
),
700
765
.
Bogacz
,
R.
, &
Gurney
,
K.
(
2007
).
The basal ganglia and cortex implement optimal decision making between alternative actions
.
Neural Computation
,
19
,
442
477
.
Bogacz
,
R.
,
Hu
,
P.
,
Cohen
,
J.
, &
Holmes
,
P.
(
2007
).
Do humans select the speed-accuracy tradeoff maximizing reward rate?
Manuscript submitted for publication
.
Cohen
,
J.
,
Dunbar
,
K.
, &
McClelland
,
J.
(
1990
).
On the control of automatic processes: A parallel distributed processing model of the Stroop effect
.
Psychological Review
,
97
(
3
),
332
361
.
Cohen
,
J.
,
Servan Schreiber
,
D.
, &
McClelland
,
J.
(
1992
).
A parallel distributed processing approach to automaticity
.
American Journal of Psychology
,
105
,
239
269
.
Eckhoff
,
P.
,
Holmes
,
P.
,
Law
,
C.
,
Connolly
,
P.
, &
Gold
,
J.
(
2008
).
On diffusion processes with variable drift rates as models for decision making during learning
.
New Journal of Physics
,
doi 1367–2360/10/1/015006
.
Eriksen
,
B.
, &
Eriksen
,
C.
(
1974
).
Effects of noise letters upon the identification of a target letter in a non-search task
.
Perception and Psychophysics
,
16
,
143
149
.
Gardiner
,
C.
(
1985
).
Handbook of stochastic methods
. (2nd ed.).
New York
:
Springer
.
Gold
,
J.
, &
,
M.
(
2001
).
Neural computations that underlie decisions about sensory stimuli
.
Trends in Cognitive Science
,
5
(
1
),
10
16
.
Gold
,
J.
, &
,
M.
(
2002
).
Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions, and reward
.
Neuron
,
36
,
299
308
.
Gratton
,
G.
,
Coles
,
M. G. H.
, &
Donchin
,
E.
(
1992
).
Optimizing the use of information: The strategic control of the activation of responses
.
J. Exp. Psych. General
,
121
,
480
506
.
Gratton
,
G.
,
Coles
,
M.
,
Sirevaag
,
E.
,
Eriksen
,
C.
, &
Donchin
,
E.
(
1988
).
Pre- and poststimulus activation of response channels: A psychophysiological analysis
.
J. Exp. Psychol. Hum. Percept. Perform.
,
14
,
331
344
.
Higham
,
D.
(
2001
).
An algorithmic introduction to numerical simulation of stochastic differential equations
.
SIAM Rev.
,
43
(
3
),
525
546
.
Holmes
,
P.
,
Shea-Brown
,
E.
,
Moehlis
,
J.
,
Bogacz
,
R.
,
Gao
,
J.
, &
Aston-Jones
,
G.
, et al
(
2005
).
Optimal decisions: From neural spikes, through stochastic differential equations, to behavior
.
IEICE Transactions on Fundamentals on Electronics, Communications and Computer Science
,
E88A
(
10
),
2496
2503
.
Laming
,
D.
(
1968
).
Information theory of choice-reaction times
.
Orlando, FL
:
.
Liu
,
Y.
, &
Blostein
,
S.
(
1992
).
Optimality of the sequential probability ratio test for nonstationary observations
.
IEEE Transactions on Information Theory
,
38
(
1
),
177
182
.
Liu
,
Y.
,
Holmes
,
P.
, &
Cohen
,
J.
(
2008
).
A neural network model of the Eriksen task: Reduction, analysis, and data fitting
.
Neural Computation
,
20
,
345
373
.
McMillen
,
T.
, &
Holmes
,
P.
(
2006
).
The dynamics of choice among multiple alternatives
.
J. Math. Psych.
,
50
,
30
57
.
Oksendal
,
B.
(
2002
).
Stochastic differential equations
.
Berlin
:
Springer
.
Platt
,
M.
, &
Glimcher
,
P.
(
2001
).
Neural correlates of decision variable in parietal cortex
.
Nature
,
400
,
233
238
.
Ratcliff
,
R.
(
1978
).
A theory of memory retrieval
.
Psych. Rev.
,
85
,
59
108
.
Ratcliff
,
R.
, &
Smith
,
P.
(
2004
).
A comparison of sequential sampling models for two-choice reaction time
.
Psychol. Rev.
,
111
,
333
346
.
Ratcliff
,
R.
,
Van Zandt
,
T.
, &
McKoon
,
G.
(
1999
).
Connectionist and diffusion models of reaction time
.
Psych. Rev.
,
106
(
2
),
261
300
.
Roitman
,
J.
, &
,
M.
(
2002
).
Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task
.
J. Neurosci.
,
22
,
9475
9489
.
Schall
,
J.
(
2001
).
Neural basis of deciding, choosing and acting
.
Nature Reviews in Neuroscience
,
2
,
33
42
.
Schall
,
J.
,
Stuphorn
,
V.
, &
Brown
,
J.
(
2002
).
Monitoring and control of action by the frontal lobes
.
Neuron
,
36
,
309
322
.
Servan Schreiber
,
D.
,
Bruno
,
R.
,
Carter
,
C.
, &
Cohen
,
J.
(
1998
).
Dopamine and the mechanisms of cognition: Part I. A neural network model predicting dopamine effects on selective attention
.
Biological Psychiatry
,
43
,
713
722
.
,
M.
, &
Newsome
,
W.
(
2001
).
Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey
.
J. Neurophysiology
,
86
,
1916
1936
.
Usher
,
M.
, &
McClelland
,
J.
(
2001
).
On the time course of perceptual choice: The leaky competing accumulator model
.
Psych. Rev.
,
108
,
550
592
.
Wald
,
A.
(
1947
).
Sequential analysis
.
Hoboken, NJ
:
Wiley
.
Wald
,
A.
, &
Wolfowitz
,
J.
(
1948
).
Optimum character of the sequential probability ratio test
.
Ann. Math. Statist.
,
19
,
326
339
.
Wang
,
X.-J.
(
2002
).
Probabilistic decision making by slow reverberation in cortical circuits
.
Neuron
,
36
,
955
968
.
Wong
,
K.-F.
, &
Wang
,
X.-J.
(
2006
).
A recurrent network mechanism of time integration in perceptual decisions
.
J. Neurosci.
,
26
,
1314
1328
.
Yu
,
A.
,
Dayan
,
P.
, &
Cohen
,
J.
(in press).
Dynamics of attentional selection under conflict: Toward a rational Bayesian account
.
J. Exp. Psych. Human Perception and Performance
.
Yu
,
A.
, &
Dayan
,
P.
(
2005
).
Inference, attention and decision in a Bayesian neural architecture
. In
L. Saul, W. Yair, L. Bottou (Eds.)
,
Advances in neural information processing systems
,
17
(pp.
179
196
).
Cambridge, MA
:
MIT Press
.