Primate vision is characterized by constant, sequential processing and selection of visual targets to fixate. Although expected reward is known to influence both processing and selection of visual targets, similarities and differences between these effects remain unclear mainly because they have been measured in separate tasks. Using a novel paradigm, we simultaneously measured the effects of reward outcomes and expected reward on target selection and sensitivity to visual motion in monkeys. Monkeys freely chose between two visual targets and received a juice reward with varying probability for eye movements made to either of them. Targets were stationary apertures of drifting gratings, causing the end points of eye movements to these targets to be systematically biased in the direction of motion. We used this motion-induced bias as a measure of sensitivity to visual motion on each trial. We then performed different analyses to explore effects of objective and subjective reward values on choice and sensitivity to visual motion to find similarities and differences between reward effects on these two processes. Specifically, we used different reinforcement learning models to fit choice behavior and estimate subjective reward values based on the integration of reward outcomes over multiple trials. Moreover, to compare the effects of subjective reward value on choice and sensitivity to motion directly, we considered correlations between each of these variables and integrated reward outcomes on a wide range of timescales. We found that, in addition to choice, sensitivity to visual motion was also influenced by subjective reward value, although the motion was irrelevant for receiving reward. Unlike choice, however, sensitivity to visual motion was not affected by objective measures of reward value. Moreover, choice was determined by the difference in subjective reward values of the two options, whereas sensitivity to motion was influenced by the sum of values. Finally, models that best predicted visual processing and choice used sets of estimated reward values based on different types of reward integration and timescales. Together, our results demonstrate separable influences of reward on visual processing and choice, and point to the presence of multiple brain circuits for the integration of reward outcomes.
Primates make approximately three to four saccadic eye movements each second, and thus, the choice of where to fixate next is our most frequently made decision. The next fixation location is determined in part not only by visual salience (Itti & Koch, 2000) but also by internal goals and reward expected from the foveated target (Schütz, Trommershäuser, & Gegenfurtner, 2012; Markowitz, Shewcraft, Wong, & Pesaran, 2011; Navalpakkam, Koch, Rangel, & Perona, 2010). Brain structures known to be involved in the control of saccadic eye movement have been extensively studied as a means of understanding the neural basis of decision-making (Sugrue, Corrado, & Newsome, 2005; Glimcher, 2003). Interestingly, the same structures also appear to contribute to the selective processing of targeted visual stimuli that tend to accompany saccades (Squire, Noudoost, Schafer, & Moore, 2013). Thus, it is conceivable that reward outcomes and expected reward (i.e., subjective reward value) control saccadic choice and processing of targeted visual stimuli via similar mechanisms.
Our current knowledge of how reward outcomes and subjective reward value influence the processing of visual information and saccadic choice comes from separate studies using different experimental paradigms. For instance, the effects of reward on saccadic choice are studied using tasks that involve probabilistic reward outcomes (Farashahi, Azab, Hayden, & Soltani, 2018; Chen & Stuphorn, 2015; Strait, Blanchard, & Hayden, 2014; Liston & Stone, 2008; Platt & Glimcher, 1999) as well as tasks with dynamic reward schedules (Costa, Dal Monte, Lucas, Murray, & Averbeck, 2016; Donahue & Lee, 2015; Schütz et al., 2012; Lau & Glimcher, 2007; Barraclough, Conroy, & Lee, 2004; Sugrue, Corrado, & Newsome, 2004), both of which require estimation of subjective reward value. In contrast, the effects of reward on the processing of visual information have been mainly examined using tasks involving unequal expected reward outcomes without considering the subjective valuation of reward outcomes (Rakhshan et al., 2020; Barbaro, Peelen, & Hickey, 2017; Hickey & Peelen, 2017; Anderson, 2016; Hickey, Chelazzi, & Theeuwes, 2010, 2014; Anderson, Laurent, & Yantis, 2011a, 2011b; Della Libera & Chelazzi, 2006, 2009; Peck, Jangraw, Suzuki, Efem, & Gottlieb, 2009). More importantly, none of the previous studies has explored the effects of reward on choice and processing of visual information simultaneously. As a result, the relationship between these effects is currently unknown.
Understanding this relationship is important because the extent to which reward influences sensory processing could impact decision-making independently of the direct effects of reward on choice. For example, in controlled decision-making paradigms or natural foraging settings, recent harvest of reward after saccade or visits to certain parts of the visual field or space could enhance processing of features of the targets that appear in those parts of space, ultimately biasing choice behavior. Such an influence of reward on sensory processing could have strong effects on choice behavior during tasks with dynamic reward schedules that require flexible integration of reward outcomes over time (Bari et al., 2019; Farashahi, Donahue, et al., 2017; Farashahi, Rowe, Aslami, Lee, & Soltani, 2017; Donahue & Lee, 2015; Soltani & Wang, 2006, 2008; Lau & Glimcher, 2007; Sugrue et al., 2004). In addition to better understanding choice behavior, elucidating the relationship between sensory and reward processing can also be used to disambiguate neural mechanisms underlying attention and reward (Maunsell, 2004, 2015; Hikosaka, 2007) and how deficits in deployment of selective attention, which is characterized by changes in sensory processing, are affected by abnormalities in reward circuits (Volkow et al., 2009).
Here, we used a novel experimental paradigm with a dynamic reward schedule to simultaneously measure the influences of reward on choice between available targets and processing of visual information of these targets. We exploited the influence of visual motion on the trajectory of saccadic eye movements (Schafer & Moore, 2007), motion-induced bias (MIB), to quantify sensitivity to visual motion as a behavioral readout of visual processing in a criterion-free manner. Using this measure in the context of a saccadic free-choice task in monkeys allowed us to simultaneously estimate how reward feedback is integrated to determine both visual processing and decision-making on a trial-by-trial basis. We then used different approaches to compare the effects of objective reward value (i.e., total harvested reward, and more vs. less rewarding target based on task parameters) and subjective reward value (i.e., estimated reward values of the two targets using choice data) on decision-making and visual processing. To estimate subjective reward values on each trial, we fit choice behavior using multiple reinforcement learning (RL) models to examine how animals integrated reward outcomes over time and to determine choice. On the basis of the literature on reward learning, the difference in subjective values should drive choice behavior. The MIB could be independent of subjective reward value, or it could depend on subjective values similarly to or differently than choice. To test these alternative possibilities, we then used correlation between the MIB and estimated subjective values based on different integrations of reward feedback and on different timescales to examine similarities and differences between the effects of subjective reward value on choice and visual processing.
We found that both choice and sensitivity to visual motion were affected by reward although visual motion was irrelevant for obtaining reward in our experiment. However, there were separable influences of reward on these two processes. First, choice was modulated by both objective and subjective reward values, whereas sensitivity to visual motion was mainly influenced by subjective reward value. Second, choice was most strongly correlated with the difference in subjective values of the chosen and unchosen targets, whereas sensitivity to visual motion was most strongly correlated with the sum of subjective values. Finally, choice and sensitivity to visual motion were best predicted based on different types of reward integration and integration on different timescales.
Two male monkeys (Macaca mulatta) weighing 6 kg (Monkey 1) and 11 kg (Monkey 2) were used as subjects in the experiment. The two monkeys completed 160 experimental sessions (74 and 86 sessions for Monkeys 1 and 2, respectively) on separate days in the free-choice task for a total of 42,180 trials (10,096 and 32,084 trials for Monkeys 1 and 2, respectively). Each session consisted of approximately 140 and 370 trials for Monkeys 1 and 2, respectively. All surgical and behavioral procedures were approved by the Stanford University Administrative Panel on Laboratory Animal Care and the consultant veterinarian and were in accordance with National Institutes of Health and Society for Neuroscience guidelines.
Saccade targets were drifting sinusoidal gratings within stationary, 5°–8° Gaussian apertures. Gratings had a spatial frequency of 0.5 cycle/degree and Michelson contrast between 2% and 8%. Target parameters and locations were held constant during an experimental session. Drift speed was 5°/sec in a direction perpendicular to the saccade required to acquire the target. Targets were identical on each trial with the exception of drift direction, which was selected randomly and independently for each target.
After acquiring fixation on a central fixation spot, the monkey waited for a variable delay (200–600 msec) before the fixation spot disappeared and two targets appeared on the screen simultaneously (Figure 1A). Targets appeared equidistant from the fixation spot and diametrically opposite one another. The monkeys had to make a saccadic eye movement to one of the two targets to select that target and obtain a possible reward allocated to it (see Reward Schedule section). Both targets disappeared at the start of the eye movement. If the saccadic eye movement shifted the monkey's gaze to within a 5–8°-diameter error window around the target within 400 msec of target appearance, the monkeys received a juice reward according to the variable reward schedule described below.
Quantifying the MIB
Eye position was monitored using the scleral search coil method (Judge, Richmond, & Chu, 1980; Fuchs & Robinson, 1966) and digitized at 500 Hz. Saccades were detected using previously described methods (Schafer & Moore, 2007). Directions of drifting gratings were perpendicular to the saccade required to choose the targets. Saccades directed to drifting–grating targets are displaced in the direction of visual motion, an effect previously referred to as the MIB (Schafer & Moore, 2007). The MIB for each trial was measured as the angular deviation of the saccade vector in the direction of the chosen target's drift, with respect to the mean saccade vector from all selections of that target within the session. This method of measuring deviation would yield approximately the same results as vertical displacement because the locations of targets were held constant throughout the session and angles were small, making angles a good approximation for the tangent of angles times the horizontal distance of the targets (vertical displacement). To compare MIB values across sessions with different target contrasts and locations, we used z score values of the MIB in each session to avoid confounds because of systematic biases.
On the basis of the above equations, the reward probabilities on saccades to the left and right targets are equal at fL = r, corresponding to matching behavior, which is slightly suboptimal in this task. As shown in Figure 1C and D, an optimal reward rate is obtained via slight undermatching. As the value of s approaches zero, matching and optimal behaviors become closer to each other.
In our experiment, reward was assigned based on target location (left vs. right), and thus, the targets' motion directions were irrelevant for obtaining reward. Nevertheless, we considered the possibility that monkeys could incorrectly assign value to motion direction. We used various RL models to fit choice behavior to determine whether monkeys attributed reward outcomes to target locations or target motions and how they integrated these outcomes over trials to estimate subjective values and guide choice behavior. Therefore, we considered RL models that estimate subjective reward values associated with target locations as well as RL models that estimate subjective reward values associated with the motion of the two targets.
At the end of each trial, subjective reward values of one or both targets were updated depending on the choice and reward outcome on that trial. We considered different types of learning rules for how reward outcomes are integrated over trials and grouped these learning rules depending on whether they estimate a quantity similar to “return” (average reward per selection) or “income” (average reward per trial). More specifically, on each trial, the monkeys could update subjective reward value of the chosen target only, making the estimated reward values resemble local (in time) return. Alternatively, the monkeys could update subjective reward values of both the chosen and unchosen targets, making these values resemble local income. We adopted these two methods for updating subjective reward values because previous work has shown that both local return and income can be used to achieve matching behavior (Soltani & Wang, 2006; Corrado, Sugrue, Seung, & Newsome, 2005; Sugrue et al., 2004). In addition, subjective reward values for the chosen and unchosen targets could be discounted when updating these values on subsequent trials similarly or differently, and monkeys could learn differently from positive (reward) and negative (no reward) outcomes. We tested all these possibilities using four different types of RL models.
Finally, we also tested hybrid RL models in which subjective values of both target locations and motion directions were updated at the end of each trial and subsequently used to make decisions. Fitting based on these hybrid models was not significantly better than those using the RL models that consider only subjective values of target locations. Therefore, the results from these hybrid models are not presented here.
Model Fitting and Comparison
Effects of Subjective Reward Value on MIB
To estimate subjective reward values associated with a given target location, we used two methods of reward integration corresponding to income and return. To calculate the subjective income for a given target location on a given trial, we filtered the sequences of reward outcomes on preceding trials (excluding the current trial) using an exponential filter with a given time constant τ, assigning +1 to rewarded trials and Δn to nonrewarded trials if that target location was chosen and 0 if that target location was not chosen on the trial. To calculate the subjective return of a given target location, we filtered reward sequence on preceding trials (again excluding the current trial) in which that target location was chosen using an exponential filter with a given time constant τ, assigning +1 to rewarded trials and Δn to nonrewarded trials. Finally, we calculated the correlation between the MIB and the obtained filtered values for different values of τ and Δn.
We also compared the monkeys' choice behavior with the prediction of the matching law. The matching law states that the animals allocate their choices in a proportion that matches the relative reinforcement obtained by the choice options. In our experiment, this is equivalent to the relative fraction of left (respectively, right) choices to match the relative fraction of incomes on the left (respectively, right) choices. Therefore, to quantify deviations from matching, we calculated the difference between the relative fraction of choosing the more rewarding target (left when r > 50 and right when r < 50) and the relative fraction of the income for the more rewarding target. Negative and positive values correspond to undermatching (choosing the better option less frequently than the relative reinforcement) and overmatching, respectively.
We trained two monkeys to freely select between two visual targets via saccadic eye movement (Figure 1A). Saccades to each target resulted in delivery of a fixed amount of juice reward with a varying probability. Targets were stationary apertures of drifting gratings, and the reward probability was determined based on the location of the grating targets independently of the direction of visual motion contained within the gratings. More specifically, on a given trial, probabilities of reward on the left and right targets were determined by the reward parameter (r) and the choice history on the preceding 20 trials (Equation 1; Figure 1B). Critical for our experimental design, the motion contained within the targets caused the end points of eye movements made to those targets to be systematically biased in the direction of grating motion (MIB). We first show that this MIB can be used as a measure of sensitivity to visual motion on a trial-by-trial basis. Next, we use an exploratory approach to study whether and how the effects of reward on choice are different or similar to the effects of reward on sensitivity to visual motion measured by the MIB. In this approach, we rely on known effects of objective and subjective reward values on choice and then test those effects for the MIB.
MIB Measures Sensitivity to Visual Motion
The MIB of a saccadic eye movement quantifies the extent to which the end points of saccades directed toward the drifting gratings were biased in the direction of grating motion (Figures 1A and 2A). Despite the stationary position of the grating aperture, motion in the drifting sinusoid nonetheless induces a shift in the perceived position of the aperture in human participants (De Valois & De Valois, 1991) and biases saccadic end points in the direction of grating drift in monkeys (Schafer & Moore, 2007). By examining the MIB in different conditions, we established that it can provide a measure of sensitivity to visual motion even when the grating motion is not behaviorally relevant.
First, we found that the magnitude of the MIB depended on the grating contrast. More specifically, the MIB increased by 27% when the (Michaelson) contrast of grating increased from 2% to 3% (two-sided independent measures t test, p = 7.85 × 10−9; Figure 2B). Second, we observed that the MIB depended almost exclusively on the motion direction of the selected target as it was only slightly affected by nonmatching motion in the unchosen target (Figure 2C). Specifically, the average z score normalized MIB measured in two monkeys across all trials (mean = 0.38) was altered by only 9% when the unchosen target differed in direction of the grating motion. Together, these results demonstrate that the MIB in our task is sensitive to the properties of sensory signal (grating motion direction and contrast) and thus can be used to measure the influence of internal factors such as subjective reward value on visual processing.
Effects of Objective Reward Value on Choice Behavior
To examine the effects of global and objective reward value on monkeys' choice behavior, we first measured how monkeys' choice behavior tracked the target location that was globally (session-wise) more valuable, which in our task is set by reward parameter r. We found that target selection was sensitive to reward parameter in both monkeys and the harvested reward rate was high, averaging 0.66 and 0.65 across all sessions (including those with penalty) in Monkeys 1 and 2, respectively (Figure 3A, B, D, and E). To better quantify monkeys' performance, we also computed the overall harvested reward by a model that selects between the two targets with the optimal but fixed choice probability in a given session (optimal static model; see example in Figure 1C in Methods) or a model in which the target with a higher probability of reward was chosen on each trial (optimal dynamic model; see Methods). We found that the performance of both monkeys was suboptimal; however, the pattern of performance as a function of reward parameter for Monkeys 1 and 2 resembled the behavior of the optimal static and dynamic models, respectively (Figure 3B and E). Because each session of the experiment for Monkey 2 was longer, we confirmed that there was no significant difference in task performance between the first and second halves of sessions for Monkey 2 (difference: mean = 0.003, SEM = 0.008; two-sided paired t test: p = .7, d = 0.03). Together, these results suggest that both monkeys followed the reward schedule on each session closely, whereas their choice behavior was suboptimal.
We also examined the global effects of reward on choice by measuring matching behavior. To that end, we compared choice and reward fractions in each session and found that both monkeys exhibited undermatching behavior (Figure 3C and F). More specifically, they selected the more rewarding location with a probability that was smaller than the relative reinforcement obtained on that location (Monkey 1 median [choice fraction − reward fraction] = −0.115; Wilcoxon signed rank test, p = 1.67 × 10−12, d = −1.35; Figure 3C inset; Monkey 2 median [choice fraction − reward fraction] = −0.03, p = 1.76 × 10−12, d = −0.83; Figure 3F inset). Furthermore, the degree of undermatching was larger for Monkey 1 than Monkey 2 (difference = −0.086; Wilcoxon rank sum test, p = 2.57 × 10−6, d = −0.37).
Effects of Objective Reward Value on Sensitivity to Visual Motion
In the previous section, we observed that choice behavior is affected by objective measures of reward value in a given session. We repeated similar analyses to examine whether objective reward values have similar effects on sensitivity to visual motion measured by MIB. To that end, we first computed the correlation between the difference in the session-based average MIB for saccades to the more and less rewarding target locations and reward parameter r in each session. However, we did not find any evidence for such correlation for either of the two monkeys (Spearman correlation; Monkey 1: r = .04, p = .8; Monkey 2: r = .11, p = .39). Second, we examined whether the average MIB for all saccades in a given session was affected by the overall performance in that session. Again, we did not find any evidence for correlation between the session-based average MIB and performance for either of the two monkeys (Spearman correlation; Monkey 1: r = −.07, p = .54; Monkey 2: r = .13, p = .21). Finally, we did a similar analysis to matching behavior to examine whether differential MIB on the two target locations is related to objective reward values of those locations. In this analysis, we computed correlation between the difference in average MIB on the better and worse target locations and the difference in total reward obtained on those locations but found no evidence for such correlation (Spearman correlation; Monkey 1: r = .003, p = .99; Monkey 2: r = .03, p = .83).
Together, these results indicate that, unlike choice, the MIB is not affected by objective reward value of the foveated target or the overall harvested reward. Observing this dissociation, we next examined the effects of subjective reward value on choice and the MIB.
Effects of Subjective Reward Value on Choice Behavior
The analyses presented above show that the overall choice behavior was influenced by global or objective reward value of the two target locations in a given session. In contrast, sensitivity to visual motion was not affected by global or objective reward value. This difference between the influence of objective reward value on choice and MIB could simply reflect the fact that, because of task design, monkeys' choices and not MIB determine reward outcomes on current trials and influence reward probability on subsequent trials (reward probability was a function of r and monkeys' choices on the preceding trials). Therefore, we next examined similarities and differences between effects of subjective reward value on choice behavior and sensitivity to visual motion.
To investigate how reward outcomes were integrated over time to estimate subjective reward values and guide monkeys' choice behavior on each trial, we used multiple RL models to fit the choice behavior of individual monkeys on each session of the experiment. These models assume that selection between the two targets is influenced by subjective values associated with each target, which are updated on each trial based on reward outcome (see Methods). Although reward was assigned based on the location of the two targets (left vs. right) in our experiment, the monkeys could still assume that motion direction is informative about reward. Therefore, we considered RL models in which subjective values were associated with target locations as well as RL models in which subjective values were associated with the motion of the two targets, using four different learning rules. Considering the observed undermatching behavior, we grouped learning rules depending on whether they result in the estimation of subjective value in terms of local (in time) return or income.
In RLret models, only the subjective value of the chosen target (in terms of location or motion) was updated, making them return-based models. In RLInc(1) models, in addition to updating the subjective value of the chosen target, the subjective value of the unchosen target was discounted on the subsequent trials similarly to the subjective value of the chosen target, making these models income-based. In RLInc(2) models, the subjective values of chosen and unchosen targets were allowed to be discounted on the subsequent trials differently. Finally, in RLInc(3) models, we also assumed a change in the subjective value of the unchosen target or motion direction in addition to the discounting across trials. Because the subjective values of both chosen and unchosen target locations were updated on each trial in RLInc(2) and RLInc(3) models, we refer to these models as income-based similarly to RLInc(1). However, we note that only RLInc(1) models are able to estimate local income accurately.
We first compared the goodness-of-fit between the location-based and motion-based RLs using −LL, AIC, and BIC to test which of the two types of models can predict choice behavior better. Such comparisons based on the three measures yield the same results because the two types of models have the same number of parameters for a given learning rule. We found that, for both monkeys, all the location-based models outperformed the motion-based RLs (Table 1). This demonstrates that both monkeys attributed reward outcomes to target locations more strongly than to target motions and used subjective value attributed to target locations to perform the task.
|.||RLret .||RLInc(1) .||RLInc(2) .||RLInc(3) .|
|Monkey 1||Δ(−LL, AIC, or BIC) = −5.48||Δ(−LL, AIC, or BIC) = −7.19||Δ(−LL, AIC, or BIC) = −6.53||Δ(−LL, AIC, or BIC) = −8.06|
|p = 2.55 × 10−7||p = 2.58 × 10−9||p = 1.39 × 10−8||p = 5.32 × 10−10|
|Monkey 2||Δ(−LL, AIC, or BIC) = −60.96||Δ(−LL, AIC, or BIC) = −105.71||Δ(−LL, AIC, or BIC) = −103.46||Δ(−LL, AIC, or BIC) = −107.25|
|p = 2.74 × 10−21||p = 2.58 × 10−26||p = 2.58 × 10−26||p = 2.58 × 10−26|
|.||RLret .||RLInc(1) .||RLInc(2) .||RLInc(3) .|
|Monkey 1||Δ(−LL, AIC, or BIC) = −5.48||Δ(−LL, AIC, or BIC) = −7.19||Δ(−LL, AIC, or BIC) = −6.53||Δ(−LL, AIC, or BIC) = −8.06|
|p = 2.55 × 10−7||p = 2.58 × 10−9||p = 1.39 × 10−8||p = 5.32 × 10−10|
|Monkey 2||Δ(−LL, AIC, or BIC) = −60.96||Δ(−LL, AIC, or BIC) = −105.71||Δ(−LL, AIC, or BIC) = −103.46||Δ(−LL, AIC, or BIC) = −107.25|
|p = 2.74 × 10−21||p = 2.58 × 10−26||p = 2.58 × 10−26||p = 2.58 × 10−26|
Δ(−LL, AIC, or BIC) shows the median of the difference between location-based and motion-based RL models fitted for each session separately. Note that all differences in goodness-of-fit measures (based on −LL, AIC, and BIC) are similar because the number of parameters is the same across location-based and motion-based models. p Values indicate the significance of the statistical test (two-sided sign test) for comparing the goodness-of-fit between the location-based and motion-based RLs.
After establishing that monkeys used target location to integrate reward outcomes, we next examined how this integration was performed by comparing the quality of fit in location-based models with different learning rules. We found that, for Monkey 1, RLret and RLInc(1) models provided the best fit of choice data; although goodness-of-fit measures were not significantly different between these models, these models provided better fits than the RLInc(2) and RLInc(3) models (Figure 4). Interestingly, fitting choice behavior with the RLInc(1) model resulted in the discount factors (α) that were close to 1 for many sessions (mean and median of α were equal to 0.77 and 1.0, respectively). This result indicates that Monkey 1 integrated reward over many trials to guide its choice behavior. This is compatible with the pattern of performance as a function of reward parameter for this monkey (Figure 3B), which resembles the pattern of the optimal static model.
The same analysis for Monkey 2 revealed a similar integration of reward outcomes but on a different timescale. More specifically, we found that the RLInc(1) model provided the best fit for choice behavior as the goodness-of-fit in this model was better than the return-based model (RLret) and more detailed income-based models (RLInc and the RLInc; Figure 4). In contrast to Monkey 1, the estimated discount factors based on the RLInc(1) model were much smaller than 1 for many sessions for Monkey 2 (mean and median α were equal to 0.32 and 0.33, respectively). These results indicate that Monkey 2 integrated reward over a shorter timescale (a few trials) than Monkey 1 to guide its choice behavior. This is compatible with the pattern of performance as a function of reward parameter for this monkey (Figure 3E), which more closely resembles the pattern of the optimal dynamic model.
Together, fitting of choice behavior shows that both monkeys associated reward outcomes with the location of the chosen target. Moreover, both monkeys estimated subjective reward values in terms of income by integrating reward outcomes over multiple trials and used these values to make decisions.
Effects of Subjective Reward Value on Sensitivity to Visual Motion
Our experimental design allowed us to simultaneously measure choice and the MIB, as an implicit measure of sensitivity to visual motion, on each trial. We next examined whether subjective reward value based on the integration of reward outcomes over time influenced sensitivity to visual motion.
To that end, we first examined whether reward feedback had an immediate effect on the MIB in the following trial. Combining the data of both monkeys, we found that the MIB was larger in the trials that were preceded by a rewarded rather than an unrewarded trial (mean = 0.03, SEM = 0.009; two-sided t test: p = 6.95 × 10−4, d = 0.18). When considering data from each monkey individually, however, this effect only retained significance for Monkey 1 (Monkey 1: mean = 0.05, SEM = 0.01; two-sided t test, p = 6.5 × 10−4, d = 0.09; Monkey 2: mean = 0.01, SEM = 0.01; two-sided t test, p = .21, d = 0.09). These results suggest that the MIB is weakly affected by the immediate reward outcome in the preceding trial.
In the previous section, we showed that the best model for fitting choice behavior was one that estimates subjective reward value based on the income on each target location and uses the difference in incomes to drive choice behavior (RLInc model; Figure 4). However, it is not clear if the MIB is influenced by subjective reward values of the two targets in a similar fashion. To test this relationship, we computed correlations between the trial-by-trial MIB and estimated subjective reward values of the chosen target location, the unchosen target location, and their sum and difference. We considered subjective reward values based on both income and return (see Effects of Subjective Reward Value on MIB section in Methods).
We made several key observations. First, we found that the MIB was positively correlated with subjective reward values of both the chosen and unchosen targets (Figure 5A, B, E, and F) and, as a result, was most strongly correlated with the sum of subjective reward values of the two targets (Figure 5C and G). In contrast to choice, the MIB was poorly correlated with the difference in subjective reward values of the chosen and unchosen targets (Figure 5D and H; Supplementary Figure 1 [http://ccnl.dartmouth.edu/Soltani_etal_20_JoCN/SuppFig1.pdf]). Therefore, choice was most strongly correlated with the difference in subjective reward values, whereas the MIB was most strongly correlated with the sum of subjective reward values from the two targets. Second, although the aforementioned relationships were true for subjective reward value based on return and income, we found that correlations between the MIB and subjective return values were stronger than correlations between the MIB and subjective income values (compare Figure 5 and Supplementary Figure 2 [http://ccnl.dartmouth.edu/Soltani_etal_20_JoCN/SuppFig2.pdf]). Third, the maximum correlation occurred for the values of τ at around 15–20 trials and for negative values of Δn, similarly for both monkeys. This indicates that, for both monkeys, the MIB was influenced by reward integrated over many trials, and the absence of reward on a given trial had a negative influence on the MIB on the following trials (Δn < 0).
Considering that local choice fraction (fL in Equation 1) has opposite effects on prL and prR because of task design, we also tested the relationship between estimated subjective values (based on return) for the two target locations. We found that the correlation between estimated reward values depends on the values of τ and Δn and is not always negative (data not shown). Nevertheless, for all values of τ and Δn, the MIB was most strongly correlated with the sum of subjective reward values while being positively correlated with the value of both chosen and unchosen targets. These results indicate that the dependence of the MIB on the sum of subjective value is not driven by our specific task design.
Finally, to better illustrate distinct effects of reward on decision-making and visual processing, we used two sets of parameters (τ = 15 and Δn = 0; τ = 15 and Δn = −0.5) that resulted in significant correlations between choice and targets' subjective income values (Supplementary Figure 1) and between the MIB and targets' subjective return values in all cases (Figure 5). We then used these two sets of parameters and choice history of the monkeys on the preceding trials to estimate subjective income values and return values in each trial (see Effects of Subjective Reward Value on MIB section in Methods). We then grouped trials into bins according to estimated subjective reward values of TL (left target) and TR (right target) for choice or of the chosen and unchosen targets for the MIB and computed the average probability of choosing the left target and the average MIB for each bin. We found that the probability of choosing the left target for both monkeys was largely determined by the difference in subjective values of the left and right targets, as can be seen from contours being parallel to the diagonals (Figure 6A, B, E, and F). In contrast, the MIB was largely determined by the sum of subjective values, as can be seen from contours being parallel to the second diagonals (Figure 6C, D, G, and H). These results clearly demonstrate that reward has distinct effects on choice behavior and sensitivity to visual motion.
Experimental paradigms with dynamic reward schedules have been extensively used in different animal models to study how reward shapes choice behavior on a trial-by-trial basis (Donahue & Lee, 2015; Li, McClure, King-Casas, & Montague, 2006; Lau & Glimcher, 2005; Barraclough et al., 2004; Sugrue et al., 2004; Herrnstein, Loewenstein, Prelec, & Vaughan, 1993). A general finding is that animals integrate reward outcomes on one or more timescales to estimate subjective reward value and determine choice. In contrast, the influence of reward on selective processing of visual information, which is often described as attentional deployment, has been mainly studied using fixed reward schedules with unequal reward outcomes (Barbaro et al., 2017; Hickey & Peelen, 2017; Hickey et al., 2010, 2014; Anderson et al., 2011a, 2011b; Della Libera & Chelazzi, 2006, 2009; Peck et al., 2009). The main findings from these studies are that targets or features associated with larger reward can more strongly capture attention and alter visual processing immediately or even after extended periods (reviewed in Anderson, 2013, 2016).
However, it has proven difficult to link the effects of reward on saccadic choice and selective processing of visual information mainly because of separate measurements of these effects in different tasks. Indeed, the poorly described relationship between reward expectation and the processing of visual information has been implicated as a confounding factor in the interpretation of many past behavioral and neurophysiological results (Maunsell, 2004, 2015). An exception to this is a study by Serences (2008) in which the author utilized a task with a dynamic reward schedule to demonstrate that the activity in visual cortex is modulated by reward history (i.e., integrated reward outcomes over many trials). Compatible with these results, we find that processing of visual information is affected by subjective reward value estimated by integration of reward outcomes over many trials.
Using tasks designed specifically to dissociate subjective reward value from a target's behavioral significance, or salience, a few studies have identified brain areas that respond primarily to the expected reward or the salience of a target (or both) in various species including rats (Lin & Nicolelis, 2008), monkeys (Roesch & Olson, 2004), and humans (Litt, Plassmann, Shiv, & Rangel, 2011; Cooper & Knutson, 2008; Jensen et al., 2007; Anderson et al., 2003). However, in these studies, the saliency signal observed in neural responses might reflect a number of different processes, such as motivation, attention, motor preparation, or some combination of these. In the present work, we exploited the influence of visual motion on saccades as an independent and implicit measure of visual processing during value-based decision-making. This enabled us for the first time to measure choice and visual processing simultaneously and to test whether reward has differential effects on these two processes.
Although motion was not predictive of reward and thus processing of motion direction was not required to obtain a reward, we found that, similar to decision-making, visual processing was influenced by subjective reward values of the two targets. However, subjective reward values of the two targets affected visual processing differently than how they affected choice in three ways. First, although choice was correlated most strongly with the difference between subjective values of chosen and unchosen targets, visual processing was most strongly correlated with the sum of subjective values of the two targets. The latter indicates that the overall subjective value of targets in a given environment could influence the quality of sensory processing in that environment. Second, choice was more strongly affected by the subjective income value of the target, whereas sensitivity to visual motion was more strongly affected by subjective return values of the targets. Third, the time constant of reward integration and the impact of no reward were different between decision-making and visual information processing. In contrast to subjective reward value, we found that objective reward value only affected choice and not sensitivity to visual motion. Together, these results point to multiple systems for reward integration in the brain.
We found certain differences between the results for the two monkeys that could indicate that they used different, idiosyncratic strategies for performing the task. For example, fitting results of RL models indicated that Monkey 1 used the reward history over many trials to direct its choice behavior. In contrast, Monkey 2 used the reward history over few trials to direct its choice behavior. This difference was also apparent in the correlation between choice and the difference in subjective values of the two target locations. Despite this difference in integration time constant, choice in both monkeys was most strongly correlated with the difference between estimated subjective values of the two targets. Furthermore, the MIB for both monkeys was most strongly correlated with the sum of estimated subjective values of the two targets, although they integrated reward outcomes on different timescales.
The observed differences in reward effects on visual processing and decision-making have important implications for the involved brain structures and underlying neural mechanisms. First, they suggest that brain structures involved in decision-making and processing of visual information receive distinct sets of value-based input, for example, ones that integrate reward over a different number of trials. The set of input affecting decision-making carries information about subjective reward value of individual targets, whereas the set that affects visual processing carries information about the sum of subjective values of targets. Indeed, there are more neurons in ACC and other prefrontal areas that encode the sum of subjective value of available options than the subjective value of a given option (Kim, Hwang, Seo, & Lee, 2009), and these neurons might contribute to enhanced sensory processing. In addition, it has been shown that the activity of basal forebrain neurons increases with the sum of subjective values of choice array options (Ledbetter, Chen, & Monosov, 2016), and this could enable basal forebrain to guide visual processing and attention based on reward feedback independently of how reward controls choice behavior (Monosov, 2020). Finally, the FEF also receives inputs from the supplementary eye field, which contains neurons whose activity reflects subjective reward value of the upcoming saccade (Chen & Stuphorn, 2015). Such input from the supplementary eye field could drive target selection in the FEF. Importantly, our findings can be used in future experiments to tease apart neural substrates by which reward influences visual processing and decision-making.
Second, a plausible mechanism that could contribute to the observed differences in the effects of reward is the differential influence of dopaminergic signaling on the functions of FEF neurons. Recent work demonstrates that the modulatory influence of the FEF on sensory activity within visual cortex is mediated principally by D1 receptors and that D2-mediated activity is not involved (Noudoost & Moore, 2011). However, activity mediated through both receptor subtypes contributes to target selection, albeit in different ways (Soltani, Noudoost, & Moore, 2013; Noudoost & Moore, 2011). This evidence indicates that the neural mechanisms underlying target selection and visual processing are separable if only in terms of the involvement of different dopaminergic signals. Considering the known role of dopamine in reward processing (Schultz, 2007) and synaptic plasticity (Calabresi, Picconi, Tozzi, & Di Filippo, 2007), these two dopaminergic signaling pathways may provide a mechanism for the separate effects of reward on sensory processing and selection.
Third, in most choice tasks with dynamic reward schedules, local subjective return and income values are typically correlated, and the question of which quantity is the critical determinant of behavior has been debated for many years (Soltani & Wang, 2006; Corrado et al., 2005; Gallistel, Mark, King, & Latham, 2001; Gallistel & Gibbon, 2000; Mark & Gallistel, 1994; Herrnstein & Prelec, 1991). The observation that differences in subjective income values are a better predictor of choice behavior may reflect the fact that income values provide information about which target is globally more valuable in each session of the task. In contrast, the dependence of visual processing on the sum of subjective return values is more unexpected. This indicates that visual processing may more strongly depend on target-specific reward integration because the return value of a given target is updated only after selection of that target.
Finally, the separable influences of reward could be crucial for flexible behavior required in dynamic and high-dimensional reward environments (Farashahi, Rowe, et al., 2017). For example, processing of visual information of the saccade target that has multiple visual features based on the sum of subjective reward values of available targets could allow processing of previously neglected information from the less rewarding targets and thus improve exploration. Future studies are needed to test whether disruption of this processing can reduce flexibility in target selection and choice behavior.
Predoctoral NRSA fellowship, Grant number: F31MH078490. NIH, Grant number: DA047870, EY014924. NDSEG fellowship. NSF, Grant number: EPSCoR Award #1632738.
We thank Vince McGinty for helpful comments on an earlier version of this article. We also thank D. S. Aldrich for technical assistance. This work was supported by NIH Grant EY014924 (T. M.), NIH Grant DA047870 (A. S.), NSF EPSCoR Award 1632738 (A. S.), an NDSEG fellowship (R. J. S.), and predoctoral NRSA fellowship F31MH078490 (R. J. S.).
Reprint requests should be sent to Alireza Soltani, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, or via e-mail: firstname.lastname@example.org.
These authors contributed equally to this work.