The Eriksen task is a classical paradigm that explores the effects of competing sensory inputs on response tendencies and the nature of selective attention in controlling these processes. In this task, conflicting flanker stimuli interfere with the processing of a central target, especially on short reaction time trials. This task has been modeled by neural networks and more recently by a normative Bayesian account. Here, we analyze the dynamics of the Bayesian models, which are nonlinear, coupled discrete time dynamical systems, by considering simplified, approximate systems that are linear and decoupled. Analytical solutions of these allow us to describe how posterior probabilities and psychometric functions depend on model parameters. We compare our results with numerical simulations of the original models and derive fits to experimental data, showing that agreements are rather good. We also investigate continuum limits of these simplified dynamical systems and demonstrate that Bayesian updating is closely related to a drift-diffusion process, whose implementation in neural network models has been extensively studied. This provides insight into how neural substrates can implement Bayesian computations.
The psychological (Laming, 1968; Ratcliff, 1978; Ratcliff, Van Zandt, & McKoon, 1999) and neural bases of decision making (Platt & Glimcher, 2001; Schall, 2001; Gold & Shadlen, 2001) have been widely studied, particularly in constrained situations such as the two-alternative forced-choice (2AFC) task. In 2AFC, subjects are required to discriminate a stimulus and give one of two permissible responses. The sequential probability ratio test (SPRT) is optimal for 2AFC tasks, whether the objective is to minimize the mean reaction time (RT) for a desired accuracy level (Wald & Wolfowitz, 1948) or to minimize a linear cost function in accuracy and detection delay under the Bayesian formulation (Liu & Blostein, 1992). The SPRT compares the relative likelihoods of noisy inputs given two possible hypotheses and reaches a decision when the cumulative evidence for one of them exceeds a fixed threshold. Performance on 2AFC tasks seems broadly consistent with the SPRT (Ratcliff & Smith, 2004), and there is evidence that competing neural populations subserving decision making may implement a strategy close to the SPRT (Gold & Shadlen, 2001, 2002; Schall, 2001; Shadlen & Newsome, 2001; Roitman & Shadlen, 2002; Schall, Stuphorn, & Brown, 2002). Moreover, the continuum limit of SPRT is an analytically tractable drift-diffusion model (DDM) (Holmes et al., 2005), which yields explicit expressions for error rates and reaction times that can be used to investigate reward-rate maximization in 2AFC (Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006), and it has been shown that various neural network models of decision making (Cohen, Dunbar, & McClelland, 1990; Cohen, Servan-Schreiber, & McClelland, 1992; Usher & McClelland, 2001) can be reduced to variants of the DDM (Bogacz et al., 2006).
The Eriksen flanker task (Eriksen & Eriksen, 1974) is an extension of the classical 2AFC task in which the decision is complicated by potentially conflicting distractor inputs. Subjects are required to discriminate a central target stimulus (e.g., the letter H or S) flanked by distractors. Flankers can be either compatible with the central target (e.g., HHHHH) or incompatible (e.g., HHSHH). Subjects display a compatibility effect, being typically slower and less accurate on incompatible than compatible trials (Eriksen & Eriksen, 1974). Furthermore, subjects perform at worse than chance level for short RTs for incompatible trials only. This dip in accuracy implies that flanker interference is particularly potent shortly after stimulus presentation. Figure 1 shows data from two instances of a deadlined Eriksen task. While specific details of reaction time distributions and relationships between accuracy and reaction time differ between the two studies, the basic compatibility effect and the dip in accuracy on incompatible trials are prominent in both.
Since the Eriksen task extends the standard 2AFC task, we suspect that optimal policy in this case is similar to the SPRT. In this vein, Yu, Dayan, and Cohen (in press) modeled the computations underlying the Eriksen task as iterative Bayesian updating, with the decision being made (and the trial terminated) when the cumulative posterior for one of the two possible target stimuli exceeds a fixed threshold. It was also proposed that the apparent suboptimality in performance can be explained by either an incorrect prior on the relative frequency of compatible and incompatible trials (compatibility bias model) or by inherent spatial overlap of visual processing neurons (spatial uncertainty model; Yu et al., in press). Here we reduce the Bayesian models to simpler dynamical systems and study them analytically and numerically. While the simpler models closely approximate the original ones in dynamics and perfomance, their analytical tractability yields explicit expressions for the dependence of inferential and psychometric quantities on model parameters. We discuss the relationship between exact Bayesian inference and drift-diffusion processes, emphasizing the link that this establishes between Bayesian updating and the neural substrates that may execute it. Our analysis also reveals the formal similarity of computations underlying the compatibility bias and spatial uncertainty models, which were motivated by disparate experimental literature and were formulated differently within the Bayesian framework.
The article is organized as follows. After reviewing the Bayesian inference models in section 2, in section 3 we derive and analyze the simplified models: uncoupled, linear discrete dynamical systems. From these, we derive explicit criteria on parameters that predict the dip in accuracy for incompatible trials, and we compare accuracies and RT distributions generated by the full and simplified models. In section 4 we show that the DDM is a continuum limit of the simplified models, and from this, we derive analytical predictions for mean posterior probabilities. We also compute accuracy versus time curves and reaction time distributions under an approximation that violates the first passage threshold crossing criterion adopted in Yu et al. (in press) but permits explicit analysis, and we provide direct comparisons between behavioral data and predictions of the full and approximate compatibility bias models. Section 5 contains a summary and discussion.
2. A Bayesian Framework for the Eriksen Task
Examples of accuracies and RTs thus predicted are shown in Figure 4. For these and other calculations, unless otherwise noted, we adopt the parameter values used in Yu et al. (in press): σ = 9 for the compatibility bias model and σ1 = 7, σ2 = 5, a1 = 0.7, a2 = 0.3 for the spatial uncertainty model, and q = 0.9 for both.
3. Linearization and Parametric Dependence
Yu et al. (in press) showed that certain choices of parameters allow both the compatibility bias and spatial uncertainty models to capture key properties of the behavioral data in Figure 1 (see Figure 4 below). Here we derive general constraints on the parameters in each model that allow them to reproduce the behavioral data: σ for the compatibility bias model; a1, a2, σ1, and σ2 for the spatial uncertainty model; and n, the number of distractors. While we cannot analyze the complex relationship between accuracy and reaction time directly, we wish to at least constrain parameters so that the mean posterior probability for s2 = 1 (the correct answer) dips below that for s2 = −1 after one or a few time steps of observations. Although the relative probability of a correct response at time t depends not just on the mean but also on higher-order moments, such an analysis would illuminate the magnitude and range of the effective parameters. Unfortunately, even this partial analysis is difficult for the original Bayesian model, as P(s2 ∣ Xt) involves the summation of two exponential functions of the inputs, as in equation 2.11, and there is no obvious way to derive the expectation of P(s2 ∣ Xt) as an explicit function of the parameters that specify the generation of the inputs X.
Due to such computational intractability, we instead work with a linearized approximation to the exact posterior update rule of equation 2.9. We will motivate and describe the approximations for the two Bayesian models and demonstrate via simulations that the parametric constraints derived from this approximate scheme provide useful bounds for the original Bayesian models.
3.1. The Compatibility Bias Model.
Hence, β > 3/4 is the necessary and sufficient condition for the average posterior probability for s2 = 1 to dip below that for s2 = −1 after one observation, when the true stimuli are the incompatible array (−1, 1, −1). More generally, we can show that the constraint is β > (n + 1)/(2n), where n is the total number of flankers. This makes intuitive sense, for it suggests that the dip is more prominent or more likely to happen when the subject's prior compatibility bias is stronger or the number of flankers is larger. Indeed, there are behavioral data suggesting that flanker interference effects are stronger when there is a lower frequency of incompatible trials (Gratton, Coles, & Donchin, 1992).
These analytical constraints guarantee a dip only in the posterior probability. As shown in Figure 2 (left), for a particular set of model parameters, the mean accuracy in compatible trials terminating within 20 time steps steadily decreases as a function of β, and it drops below .5, indicating the presence of a dip, for all values of β > 0.82: somewhat higher than β = 0.75, the lower bound of inequality (see equation 3.7) that results in a dip in posterior probability.
A major factor underlying the discrepancy between the two constraints is that we considered only the mean of the posterior probability, not the full distribution. The mean accuracy depends not only on the mean posterior value but also on higher moments. If the distribution were symmetric about its mean, the dip in the mean posterior would directly translate into a dip in accuracy, but as we will show in section 4, the distribution of the posterior trajectories is strongly skewed, and the interaction of that skewness with the decision threshold also plays a role in determining the presence of the dip in accuracy versus reaction time.
A second reason for the discrepancy is that the theoretical bounds are for the dip to occur in the posterior after one time step, whereas in the numerical simlations, due to the infrequency of responses at very short RTs, all trials that terminate within the first 20 time steps were used to estimate accuracy. If the temporal extent of the dip in the posterior distribution is very small (which is likely in boundary cases), then conditional accuracy may not fall below 0.5 when averaged over 20 time steps. The numerically obtained constraints are therefore likely to be more conservative than the analytical approximations.
3.2. The Spatial Uncertainty Model.
3.3. Evaluating the Cost of Linearization.
Direct simulations of the linear approximation can be compared with those of the original inference model. Figure 3 shows the results for the compatibility bias model for a particular setting of parameters (σ = 9), comparing the full inference model with the simplified iteration of equation 3.6. The same sequence of noisy observations xi(t) was used for both processes, and in computing the value of for the latter at each time step t, normalization was applied only at that step. The agreement is remarkably good, validating our linear approximations to the products of probabilities (5-4) developed in section 3. The quality of the linear approximation for the spatial uncertainty model is similarly good (details not shown).
We can also simulate perceptual discrimination based on the linearized evidence accumulation process, using the first passage criterion for threshold crossing appropriate for free response conditions. As in Yu et al. (in press), we adopt the decision threshold q = 0.9 for both the compatibility bias and the spatial uncertainty model. The time span, taken here as 200 steps, is divided into 10 bins, and sample paths for the full model, equation 2.9, and the approximate decoupled system, equation 3.6, and its analogue for spatial uncertainty are computed. The decoupled results are then normalized by dividing by the sum ∑i, jvi, jt at each t in the current bin (normalization is not applied for steps 1 through t − 1). The same (unit) step size is used in all cases. Responses are logged when the first of the probabilities P(s2 = +1 ∣ Xt) = P(s2 = +1, M = 1 ∣ Xt) + P(s2 = +1, M = 2 ∣ Xt) or P(s2 = −1 ∣ Xt) = P(s2 = −1, M = 1 ∣ Xt) + P(s2 = −1, M = 2 ∣ Xt) crosses q. After collecting sufficiently many paths (2000 in this case), response time histograms are formed and the fraction of correct responses in each bin summed to yield accuracy versus time curves.
Figure 4 illustrates the results of such simulations for the compatibility and spatial uncertainty models. Accuracy versus reaction time, and empirical distributions of reaction time, are shown for both the full and approximate models. The approximate systems reproduce the characteristic dip in accuracy for fast, incompatible trials for both models, and the accuracy curves and reaction time distributions predicted by the approximate theory agree well with those of the full inference models. Note that the use of the first passage criterion for response produces reaction time distributions that agree with the exact model in details of their shapes: a rise at short reactions times to a peak, followed by a long tail. The distributions for incompatible trials are also flatter and shifted rightward compared to those for compatible trials, as in the data of Figure 1.
4. A Continuum Limit
The key difficulty in working with the discrete dynamical systems 3.3 and 3.9 lies in the nonlinear coupling of the posteriors zi, jt through the denominators Dt and Dt′. It can be proved that individual sample paths generated with the same noise inputs are identical whether computed by iteration of equations 3.3 and 3.9 or by the analogous uncoupled systems, equation 3.6, with posteriors normalized only at the last time step (cf. equation 3.5). (In computing the values for the approximate model 3.6 at each step t for Figure 3, normalization was applied only at that step, but not at steps 1 through t − 1, while the full iteration, equation 2.9, is normalized at every step.) However, it does not follow that we may average over many realizations of the unnormalized process and then normalize (as discussed further in Section 4.3), since these operations do not commute. Nonetheless, we can decouple the dynamics by replacing the normalization constant Dt at each time step with its expectation 〈Dt〉, which does not depend on the inputs, and by replacing that in turn by a constant. We then take continuum limits of the resulting decoupled linear systems to form stochastic differential equations (SDEs), allowing us to use simple analytical results to compute properties of interest. As described in section 5, these SDEs may be related to neurally based models of evidence accumulation.
4.1. Approximating the Denominators.
4.2. Taking the Continuum Limit.
4.3. Analytical Approximations for the Mean Posteriors.
As noted at the beginning of this section, normalization and averaging do not commute. This may be understood in terms of the distributions of Figure 7 as follows. While each sample path can be computed for the uncoupled processes and normalized at time t to yield the same result as a sample path of the coupled system (cf. Figure 3), different normalization factors must typically be applied to the values of different paths zi, j(t) at each time t. This would distort the distributions p(zi, j, t), thereby changing their means. However, we may appeal to the observation that the expected value of the denominator remains close to 1 (cf. Figure 5) to conclude that this distortion is likely to be small, and proceed by dividing by the sums of the four mean probability trajectory values at time t to normalize the resulting expressions.
Typical results for mean posterior probabilities are shown in Figure 8. The approximate predictions developed above are shown as dashed curves and the results of averaging over 5000 simulated trials of the full inference model, equation 2.9, are shown solid; compatible and incompatible trials are shown in dark and light, respectively. As above, we compute 200 steps for the discrete iteration of the full system, and we evaluate the corresponding quantities for t ∈ [0, 200] time units from the formulas above. For P(M) = 0.5 (not shown), joint posteriors for correct responses increase similarly for both compatible and incompatible cases, but P(M) = 0.9 elicts markedly different behaviors (top left). The compatibility posteriors show a general rise for compatible stimuli and a monotonic fall for incompatible stimuli, but the posterior probability shows a significant dip below 0.5 at early times for incompatible stimuli, while it rises monotonically for compatible stimuli. As discussed in section 5, the resulting accuracies exhibit similar patterns to the experimental data, with the incompatible case showing a dip in accuracy for early responses. Evolutions of the four individual posterior probabilities are shown in the lower panels of Figure 8.
Figure 8 illustrates that while the approximations developed here do not capture all the detailed behavior of the full model, they do provide reasonably good approximations to the average evolutions of the posteriors over the course of a trial. Timescales are slightly misestimated, and the compatibility posterior (top right) fails to reproduce the slight dip below 0.9 that occurs for compatible trials at early times, but the relative orderings of all the posteriors are correctly predicted. Overall, absolute errors in mean posteriors, computed as described at the end of this section, lie between 0.002 and 0.05, the largest being for in the case of incompatible stimuli (top right, lower curves).
4.4. Making Use of Explicit Mean Posteriors.
In addition to providing explicit expressions for posterior probabilities, the continuum limit also yields approximations for accuracy and reaction time distributions. To estimate accuracy as a function of response time under the free response protocol assumed by Yu et al. (in press), we compute the fraction of mass of the evolving probability density p(zti,1, zti,2) that exceeds a given threshold zti,1 + zti,2 = q at each time t (recall equation 2.11). This procedure overestimates first passage times, since some of the sample paths that lie beyond the threshold q at time t may have crossed at earlier times, but it permits some analytical simplification. Without loss of generality, we shall assume that s2 = 1.
As we have noted, sample paths of the SDE, equation 4.11, may pass across q and back, possibly repeatedly, in the interval (0, t), so these results do not directly correspond to the first-passage decision policy of the Bayesian models in Yu, Dayan, and Cohen (in press). This accounts for differences between the accuracy curves and reaction time distributions of Figure 1 and the free response results of section 3.3. For example, the compatibility bias-free response data of Figure 4 do not show the mild decline in accuracy for later compatible trials of Figure 10, although the spatial uncertainty simulations of Figure 4 do show such a decline. Nonetheless, the qualitative agreement between Figures 10 and 4 is quite good, and since the semiexplicit expression of equations 4.26 and 4.27 replaces the lengthy Monte Carlo simulations of section 3.3, it may be helpful in guiding parameter fits to data.
The posterior probability expressions can also be used to constrain parameter choices by requiring the derivative of at time t = 0 to be negative and finding corresponding conditions on the parameters. The results of this computation (details not shown) agree closely with those in section 3.
4.5. Fitting the Models to Data.
We now briefly describe the results of fitting the full models of section 2 and the reduced DD processes of sections 4.2 to 4.4 to the data of Servan-Schreiber et al. (1998), reproduced in Figure 1B. For the compatibility bias model, the parameters fitted are the noise level σ, prior β, threshold q, and step durations δt (for DDM) and δt (for the full model), which determine the overall timescale. For spatial uncertainty, they are σ1, σ2, a1, q, and δt, δt (as in section 3.2, we set a2 = 1 − a1). To these we add one further parameter, T0, to account for time occupied by sensory decoding and motor response mechanisms, which superimposes a rightward shift on the RT distributions. (Such an “overhead time” might approximate the mean RT on a simple target detection task.)
We employ the same weighted Euclidean error norm as in Liu, Holmes, and Cohen (2008); see appendix A.2, below, for details. The parameter values obtained are as follows: compatibility bias: σ = 6.5, β = 0.87, q = 0.98, δt = 0.95 ms, ΔT = 1.04 ms, and T0 = 90 ms; spatial uncertainty: σ1 = 6.9, σ2 = 5.1, a1 = 0.71, q = 0.92, δt = 3.4 ms, ΔT = 0.33 ms, and T0 = 95 ms. Note that the noise levels are consistent with the assumptions of sections 3, 4.1, and 4.2: for example, 1/σ4 ≪ 1/σ2 (cf. equation 3.2). The fitting errors are as follows: compatibility bias: full model 2.5; DDM 2.3; spatial uncertainty: full model 2.1; DDM 1.8. In fitting, we excluded data points in the first (0–100 ms) and the last (900–1000 ms) of the 10 RT bins, since no accuracy data are available for the former, and all trials in which responses exceeded 1000 ms were placed in the latter (note the uptick in RT distributions at the right-most data point). However, we computed model data in that bin and in the next one (1000–1100 ms). Since our fitted values of the overhead time T0 push even the shortest model RTs beyond 100 ms, accuracies cannot be computed for the 0–100 ms bin unless we assume some premature responses that are initiated before stimulus onset. For such premature responses, the equal prevalence of H and S in the experiments ensures that accuracy approaches chance at very short decision times (cf. the upper left panels of Figures 8 and 9). Indeed, this chance performance is unavoidable, independent of the inference or decision strategy, since the response is deprived of stimulus information and cannot possibly correlate with stimulus identity.
These results are shown in Figure 11. Fit qualities are slightly better for the spatial uncertainty model, and in both cases, perhaps surprisingly, fit errors are slightly smaller for the reduced DDM than for the full Bayesian procedure. The fit errors are similar to that of 2.4 obtained in Liu et al. (2008) for the Gratton et al. (1988) data (see Figure 1A), using a DDM with variable drift rates derived from the neural network model of Cohen et al. (1992). That model contains eight free parameters, compared with five and six, respectively, in the present cases. Indeed, in Liu et al. (2008), six parameters are required to describe drift rates in the compatible and incompatible cases, modeling progressive increase in attention to the central stimulus, and these cases are fitted separately. In this study, compatible and incompatible trials are fitted simultaneously, and a single parameter in each model (the compatability prior β, or the weight a1), along with Bayesian updating, serves to describe the accumulation of evidence.
Both models underestimate mean RTs for compatible trials, producing an excess of points in the 200–250 ms RT bin. They are also unable to capture the drop in accuracy at the shortest RTs on compatible trials (left panels) due to the T0 behavior noted above. They do reproduce this drop on incompatible trials, although the full compatibility bias model does not exhibit the dip below 50%. The spatial uncertainty model is substantially better in this regard (lower right panel), although it underestimates accuracy in the 400–900 ms part of the RT range for both the compatible and incompatible cases. In preliminary work, we also tried a modified norm that preferentially weights low RT data: this slightly improved fits of RT distributions but did not affect compatible accuracy fits. We also fitted the full and DD models to the data of Gratton et al. (1988; see Figure 1A) obtaining similar fit qualities, although the failure to capture the steady rise from 50% accuracy at low RTs for compatible trials was more striking in that case (model results not shown here).
We note that individual subjects exhibit large differences in signal-to-noise ratios and thresholds (in DDM fits; cf. Ratcliff et al., 1999; Bogacz, Hu, Cohen, & Holmes 2007), and that here we have averaged over all subjects to produce single sets of fit parameters for each model. As illustrated in Figure 1, there is also substantial variability in Eriksen data, perhaps due to differing deadlining protocols. (Deadlines are necessary to produce enough short reaction times and hence obtain a significant dip in accuracy on incompatible trials.) The resulting variability in motor preparation times can affect reaction times, and no allowance for this is made in the inference model, which describes only cognitive processing. Our additional parameter T0 only partially accounts for this, and in this case, it deprives us of accuracy data in the smallest RT bin.
5. Discussion and Conclusions
In Liu et al. (2008), a neural network model of the Eriksen task (Cohen et al., 1992; Servan-Schreiber et al., 1998) was linearized and reduced to a DDM with time-varying drift, allowing relatively complete analysis that reveals how parameters influence accuracy curves such as those of Figure 1. However, this network model involves somewhat arbitrary assumptions on architecture and parameters, and it is not clear how the DDM reduction of Liu et al., with its variable drift rate, relates to the optimal decision theory for the constant drift case (Bogacz et al., 2006). This article addresses this issue by offering analytically tractable approximations to two Bayesian inference models (compatibility bias and spatial uncertainty) proposed in Yu et al. (in press).
Specifically, the joint signal probability distribution of equation 2.4 is approximated as a linear sum, and then, by assuming that the sum of the nonnormalized posteriors remains close to one and taking a continuum limit, we obtain analytical expressions for the mean posterior probabilities. Employing a further approximation in which the net probabilities of having answered correctly or incorrectly at time t are computed, we derive semianalytical approximations for accuracy and reaction time distributions. While the latter correspond more closely to an “interrogation protocol” (Bogacz et al., 2006; Liu et al., 2008) in which subjects are cued to respond at specific times, and so differ quantitatively from those computed numerically for free responses (compare Figure 10 with Figure 4), the overall accuracy curves and individual posteriors derived from the continuum model reproduce those of the Bayesian model quite well (see Figures 8 and 9).
We therefore expect that our analytical approximations will be useful in guiding parameter selection when fitting models to experimental data. In section 3, we provide an example of this by deriving simple parametric constraints that must hold to obtain the dip below 50% in the posterior probability for early responses. Moreover, although the coefficients differ, the linearized update rules of both equations 3.3 and 3.9 demonstrate that the flanker inputs x1 and x3 work with the target input x2 for the compatible hypotheses and against it for the incompatible hypotheses. This underlying computational architecture gives rise to the same basic ability of both the compatibility bias and spatial uncertainty models to account for the dynamics of flanker interference in behavioral data. In section 4.5, we show that both the original models and DDM approximations derived from them can be fitted to experimental data, further strengthening our case.
Our analysis also reveals that a particularly simple stochastic differential equation, the constant-drift diffusion (DD) process of equation 4.11, approximately describes the evolution of Bayesian posteriors in log probability space. As described in Bogacz et al. (2006), this is a continuum limit of the sequential probability ratio test (Wald, 1947), which is known to be optimal for identifying noisy signals in two-alternative choice tasks (Wald & Wolfowitz, 1948). Moreover, it has been shown (Bogacz et al., 2006; Liu et al., 2008) that DD and related Ornstein-Uhlenbeck processes emerge naturally in linearized reductions of competing leaky accumulator models (Usher & McClelland, 2001) for 2AFC. In these neural networks, the difference between activities in a pair of units at the output decision or response stage behaves like the accumulating variable y(t) in equation 4.11 (Gold & Shadlen, 2001).1 DD models can also capture bottom-up (stimulus-driven) and top-down influences such as attention and expectation of rewards via variable drift rates (Liu et al., 2008; Eckhoff, Holmes, Law, Connolly, & Gold, 2008).
Since accumulator models may be derived from biophysical models of spiking neurons (Wang 2002; Wong & Wang, 2006), in which their activities represent short-term averages of collective firing rates, this suggests a mechanism by which neural substrates may be able to perform Bayesian computations. Specifically, in reducing the coupled Bayesian inference model, equation 2.9, to a DD process, we see how prior information maps into initial conditions, and evolving posteriors in log probability space are represented by spike rates of groups of neurons. In connection with the latter, we note that Bogacz and Gurney (2007) present computational and experimental evidence that Bayesian computations involving exponentiation and taking logarithms (cf. Yu & Dayan, 2005), as in section 4, can be approximated by neurons in the basal ganglia.
Appendix: Mathematical and Data Fitting Details
A.1. Evaluation of Integrals.
A.2. Data Fitting Method.
This work was supported by PHS grants MH58480 and MH62196 (Cognitive and Neural Mechanisms of Conflict and Control, Silvio M. Conte Center). Y.L. benefited from studentship support from the School of Engineering and Applied Science at Princeton University, and A.Y. received funding from an NIH NRSA institutional training grant. We thank the referees for perceptive and helpful comments.