Competitions are part and parcel of daily life and require people to invest time and energy to gain advantage over others and to avoid (the risk of) falling behind. Whereas the behavioral mechanisms underlying competition are well documented, its neurocognitive underpinnings remain poorly understood. We addressed this using neuroimaging and computational modeling of individual investment decisions aimed at exploiting one's counterpart (“attack”) or at protecting against exploitation by one's counterpart (“defense”). Analyses revealed that during attack relative to defense (i) individuals invest less and are less successful; (ii) computations of expected reward are strategically more sophisticated (reasoning level k = 4 vs. k = 3 during defense); (iii) ventral striatum activity tracks reward prediction errors; (iv) risk prediction errors were not correlated with neural activity in either ROI or whole-brain analyses; and (v) successful exploitation correlated with neural activity in the bilateral ventral striatum, left OFC, left anterior insula, left TPJ, and lateral occipital cortex. We conclude that, in economic contests, coming out ahead (vs. not falling behind) involves sophisticated strategic reasoning that engages both reward and value computation areas and areas associated with theory of mind.
In his Principles of Political Economy, John Stuart Mill (1859) observed that “a great proportion of all efforts…[are] spent by mankind in injuring one another, or in protecting against injury.” Such appetite for “injuring others” and to defend against being injured has recently been documented in economic contest experiments in which individuals invest to obtain a reward at a cost to their competitor (henceforth attack) or to avoid losing their resources to their antagonist (henceforth defense; De Dreu & Gross, 2019; Chowdhury, Jeon, & Ramalingam, 2018; De Dreu, Kret, & Sligte, 2016; Wittmann et al., 2016; Chen & Bao, 2015; De Dreu, Scholte, van Winden, & Ridderinkhof, 2015; Zhu, Mathewson, & Hsu, 2012; Carter & Anderton, 2001; Grossman & Kim, 1996). These experiments showed that humans invest in injuring others through attacks and in protecting against injuring through defense, that investments in attack are typically less frequent and forceful than investments in defense, and that attack decisions disproportionally often fail and defenders relatively often survive (with ≈30% victories against ≈70% survivals; for a review, see, e.g., De Dreu & Gross, 2019).
Resonating with the idea that competition can be costly, participants during such attacker–defender contests typically waste about 40% of their wealth in fighting each other (De Dreu & Gross, 2019). Yet why people invest in attack and defense remains poorly understood. In fact, investing in injuring others and in protecting against injury may reflect an array of subjective “desires” (Charpentier, Aylward, Roiser, & Robinson, 2017; Delgado, Schotter, Ozbay, & Phelps, 2008; Dorris & Glimcher, 2004). Perhaps humans invest in attack and defense to maximize their personal earnings, as is typically assumed in standard economic theory (e.g., Ostrom, 1998). Relatedly, individuals may invest in attack and defense because of “competitive arousal” and rivalry (Delgado et al., 2008; Ku, Malhotra, & Murnighan, 2005). Finally, investment in attack and defense may be driven by a desire to minimize risk and uncertainty (Delgado et al., 2008; Kahneman & Tversky, 1984). Indeed, decision-making in competitive contests is inherently risky—investments are typically wasted and may result in no return (among attackers), wasted resources (when attacks were unexpectedly shallow and one thus overinvested in defense), or costly defeat (when attacks were unexpectedly tough). Humans factor in such risks when making decisions and are typically risk-averse (Tobler, O'Doherty, Dolan, & Schultz, 2007; Kuhnen & Knutson, 2005; Loewenstein, Weber, Hsee, & Welch, 2001).
Humans may hold conflicting desires when investing in attack and defense and may need to balance between maximizing reward and minimizing risk. What individuals aim for and how possibly conflicting desires are regulated is difficult to infer from behavioral decision-making alone. To illustrate, consider a two-player contest in which one participant can invest in attack and the other participant in defense. When the attacker invests more than its defender, attackers obtain all what the defender did not invest, and the defender would be left with 0. If attackers invest equal or less than their defender, both sides earn their noninvested resources (De Dreu & Gross, 2019; Chowdhury et al., 2018; De Dreu, Gross, et al., 2016; De Dreu, Kret, et al., 2016; De Dreu et al., 2015; Carter & Anderton, 2001; Grossman & Kim, 1996).1 It follows that investments can increase attacker earnings and their competitive success and can prevent defenders from losing their remaining endowment to their attacker. At the same time, however, not investing resources eliminates the attacker's uncertainty about earnings from the contest, alongside the possibility of losing money. Defenders, in contrast, reduce such uncertainty and possibility of losing the contest by investing resources (Chowdhury et al., 2018).
We solved this problem of inference using a two-pronged approach inspired by recent work in cognitive neuroscience on learning from reward and risk prediction (Olsson, FeldmanHall, Haaker, & Hensler, 2018; Palminteri, Wyart, & Koechlin, 2017; Preuschoff, Quartz, & Bossaerts, 2008; Preuschoff & Bossaerts, 2007). First, from investments in attacker–defender contests, we computed, using a k-level reasoning approach, estimates of expected reward and expected risk (Zhu et al., 2012; Ribas-Fernandes et al., 2011; Botvinick, Niv, & Barto, 2009; Camerer, Ho, & Chong, 2004; Nagel, 1995; Stahl & Wilson, 1995; Harsanyi, 1967). The computational approach incorporates the intuition that the formation of expectations and beliefs in strategic interactions are recursive (i.e.,  I think that  you think that  I think that …) and can be more or less sophisticated (i.e., the number of recursions k). Using computational modeling and model comparison, we estimated for each investment in attack and defense the expected reward and risk, and concomitant reward and risk prediction errors. Our modeling thus defines (expected) reward as the (expected) monetary payoff from investment in attack and defense (e.g., Zhu et al., 2012) and (expected) risk as the (expected) variance of the reward prediction error (Preuschoff et al., 2008; Preuschoff & Bossaerts, 2007).
Second, and next to an exploratory whole-brain analysis potentially revealing currently unknown cues about the neural foundations of exploitation and protection, we linked prediction errors to a priori defined ROIs—the ventral striatum (VS) and the amygdala. We chose the VS because it has been extensively linked to reward processing and competitive success (viz. reward maximization; Metereau & Dreher, 2015; McNamee, Rangel, & O'Doherty, 2013; Balodis et al., 2012; Rudorf, Preuschoff, & Weber, 2012; Zhu et al., 2012; Xue et al., 2009; Preuschoff & Bossaerts, 2007). We chose the amygdala because of its involvement in low-level affective processing of threat to resources (viz. risk minimization; De Dreu et al., 2015; Choi & Kim, 2010; Baumgartner, Heinrichs, Vonlanthen, Fischbacher, & Fehr, 2008; Delgado et al., 2008; Nelson & Trainor, 2007; Phelps & LeDoux, 2005).
Participants and Ethics
Male participants (M = 25.31 years, n = 27) were recruited via an online recruiting system for participating in a neuroimaging study on human decision-making. Exclusion criteria were significant neurological or psychiatric history, prescription-based medication, smoking more than five cigarettes per day, and drug or alcohol abuse.2 Eligible participants were assigned to a session and instructed to refrain from smoking or drinking (except water) for 2 hr before the experiment that lasted approximately 1.5 hr. They received a show-up fee of €30 in addition to the earnings from decision-making. The experiment involved no deception and was incentivized (see below), received ethics approval from the Psychology Ethics Committee of the University of Amsterdam, and complied with the guidelines from the American Psychological Association (6th edition). Participants provided written informed consent before the experiment and received a full debriefing afterward.
Experimental sessions were conducted between noon and 4 p.m., and participants were tested individually (also see De Dreu et al., 2015). Upon arrival, participants were escorted to a private cubicle where they read and signed an informed consent form. Participants received a booklet with instructions for the attacker–defender game (labeled Investment Task), containing several examples of investments and their consequences to both attacker (labeled Role A) and defender (labeled Role B) and several questions to probe understanding of the game structure and decision consequences. Neutral labeling was used throughout.
Upon finishing the instructions for the contest, the experimenter prepared the participant for neuroimaging. During the fMRI session, participants completed six functional runs, each consisting of a 20-trial block played as either attacker or defender. Participants thus alternated between the role of attacker and defender every 20 trials, with the starting order counterbalanced across participants. Importantly, we used a random partner matching one-shot protocol, eliminating reputation concerns (Zhu et al., 2012). In each session, participants made 60 investments as attacker and 60 as defender. For each investment trial, they received a prompt, randomly generated between 0 (indicating no investment) and 10 (indicating investment of the entire endowment) and used a button-press to adjust the given number up or down to indicate their desired investment. The duration of the selection period was self-paced and had an average length of 4.27 sec (SD = 3.43 sec; see Figure 1). After selecting their investments, participants waited an average of 6.08 sec (SD = 2.22 sec), at which point they received feedback about their counterpart's investment and were shown the respective payoffs to oneself and the other (who was randomly chosen on each trial from a pool of 150 attacker [defender] investments; for further detail, see De Dreu, Giacomantonio, Giffin, & Vecchiato, 2019; De Dreu et al., 2015). At the end of the experiment, participants received their participation fee and earnings by bank transfer (range €0–€8, with M = €5 for nonscanner participants, and €0–€33, with M = €19 for scanner participants). Accordingly, participant pay was private and conditioned on their performance.
The attacker–defender contest (Figure 1B) consists of two players: an attacker and a defender. Each player was endowed with €10 from which they could invest in the contest. Investments were always wasted, but if the investments by the attacker (x) exceeded that by the defender (y), the attacker (x > y) obtains all of the defender's noninvested endowment (e − y). In this case, the attacker's total earning was 2e − x − y, and the defender earned 0. If, in contrast, the defenders investment matched or exceeded that by the attacker (y ≥ x), both defender and attacker earned what was left from their endowment (e − y and e − x, respectively; De Dreu et al., 2015, 2019; De Dreu, Gross, et al., 2016; De Dreu, Kret, et al., 2016).
The attacker–defender contest has a contest success function f = Xm/(Xm + Ym), where f is the probability that the attacker wins, m → ∞ for X ≠ Y and f = 0 if Y = X. Assuming rational selfish play and risk neutrality, standard economic theory predicts that attackers and defenders use mixed strategies when investing. With e = €10 per trial (as used in the current experiment), the mixed strategies for attack (with probability of investing x denoted by p(x)) and defense (with probability of investing y denoted by p(y)) define a unique Nash equilibrium where expected investments in attack are both lower (x = 2.62) than in defense (y = 3.38) and less frequent (probability of attack [defense] = 60% [90%]). However, when attacks are made, they are expected to be more “forceful” (4.36 vs. 3.75 for defense).3
Modeling Investment Behavior with k-level Sophistication
To compute individual estimates of expected reward and concomitant reward and risk prediction errors, we adapted the cognitive hierarchies framework developed in behavioral economics (Botvinick et al., 2009; Camerer et al., 2004; Nagel, 1995). The idea is that players hierarchically form beliefs about their opponents' behavior up to a certain level of cognitive sophistication (k-level). A k-0 player invests randomly. At k = 1, the individual assumes that her opponent has k = 0 and finds an investment that maximizes her expected reward under this assumption. At k = 2, the individual assumes that her opponent has k = 1 and finds an investment that maximizes her own expected reward under the assumption that the opponent seeks to maximize his personal reward against a k-0 player. This recursion can, in theory, continue infinitely, yet in our computational modeling, we limited k ≤ 5. Specifically, when Is represents a player's own investment (s stands for “self”) and Io as their representation of the other player's investment (o stands for “other”), we can formally express:
k-levels 2 → n
For each k-level, k ≥ 2, the above procedure is iterated k-times, with k-level predictions of investments—needed to compute probabilities of success, expected rewards, and choice probabilities—being generated by the softmax at the preceding level (see Figure 2). Hence, each k-level model has k-free parameters, which constitutes the choice temperature at each level βk.
Bayesian Model Comparison
To identify the model most likely to have generated a certain data set, ME was computed at the individual level for each model in the respective model space and fed to random effects Bayesian model comparison using the mbb-vb-toolbox (mbb-team.github.io/VBA-toolbox/; Daunizeau, Adam, & Rigoux, 2014). This procedure estimates the expected frequencies (denoted PP) and the exceedance probability (denoted XP) for each model within a set of models, given the data gathered from all participants. PP quantifies the posterior probability that the model generated the data for any randomly selected participant. XP quantifies the belief that the model is more likely than all the other models of the model space. An XP > 95% for one model within a set is typically considered as significant evidence in favor of this model being the most likely.
To assess the reliability of our modeling approach, we performed model identifiability simulations (see Correa et al., 2018, for a similar approach). Choices from synthetic participants were generated for each task and each model by running our computational models, with model parameters sampled in their prior distribution: softmax temperature β were drawn from gamma distribution (random(“Gamma,” 1.2, 3)). For each model, we ran 10 simulations including 27 synthetic participants (n = 270), playing both attacker and defender for three blocks of 20 trials. Model identifiability was assessed by running the Bayesian model comparison on the synthetic data.
MRI Data Acquisition, Preprocessing, and Data Analysis
Scanning was performed on a 3T Philips Achieva TX MRI scanner using a 32-channel head coil. Each participant played six blocks of the attacker–defender game in which functional data were acquired using a gradient-echo, echo-planar pulse sequence (repetition time = 2000 msec, echo time = 27.63 msec, flip angle = 76.18, 280 volumes, field of view = 1922 mm, matrix size = 642, 38 ascending slices, slice thickness = 3 mm, slice gap = 0.3 mm) covering the whole brain. For each participant, we also recorded a 3DT1 recording (3D T1 TFE, repetition time = 8.2 msec, echo time = 3.8 msec, flip angle = 88, field of view = 2562 mm, matrix size = 2562, 160 slices, slice thickness = 1 mm) as well as respiration, pulse oximetry signal, and breath rate. Stimuli were back-projected onto a screen that was viewed through a mirror attached to the head coil.
Analyses were conducted with FSL (Oxford Centre for Functional MRI of the Brain Software Library; www.fmrib.ox.ac.uk/fsl) and custom scripts written in MATLAB. All fMRI data were prewhitened, slice-time corrected, spatially smoothed with a 5-mm FWHM gaussian kernel, motion corrected, and high-pass filtered. Functional images were registered to each participant's high-resolution T1 scan and subsequently registered to Montreal Neurological Institute (MNI) space.
Our primary goal was to determine if neural activity was modulated by the expected values and/or prediction errors from our reinforcement learning model. The entire fMRI analysis consisted of a three-level analysis: Level 1 was averaging within runs within participants, Level 2 was averaging across runs within participants, and Level 3 was testing for significance at the group level. We constructed three different general linear models (GLMs) to test for significant neural differences between attack and defense behavior as well as to see if attack and defense behavior correlated with our variables of interest. GLM-1 was meant to test for simple model-free differences between attacker and defender neural activity and consisted only of the selection and feedback epochs. GLM-2 was meant to determine if neural activity significantly correlated with investment magnitude during the selection time-phase and whether wins/losses significantly correlated with neural activity during feedback. To this end, it consisted of the following regressors: selection, selection modulated by investment (orthogonalized with respect to selection), feedback, and feedback modulated by wins/losses (z-scored and orthogonalized with respect to feedback). GLM-3 was meant to determine whether any neural activity correlated with the parameters calculated from our k-level model and contained the following regressors: selection, selection modulated by expected value (orthogonalized with respect to selection), selection delayed by 4 sec to capture the delayed nature of risk prediction (Preuschoff et al., 2008), delayed selection modulated by risk prediction (orthogonalized with respect to delayed selection), feedback, feedback modulated by the prediction error (z-scored and orthogonalized with respect to feedback), and feedback modulated by the risk prediction error (z-scored orthogonalized with respect to feedback). To mitigate spurious results from asymmetric parameter value ranges (Lebreton, Bavard, Daunizeau, & Palminteri, 2019), each parametric regressor was z-scored within each role, meaning both attacker and defender parametric regressors had identical variance.
We checked for multicollinearity by calculating the variance inflation factors (VIFs) for each regressor of interest (Mumford, Poline, & Poldrack, 2015) and found none to be problematic (all VIFs < 2.3). However, four participants made identical investments on every trial, which resulted in rank deficient models (four participants for GLM-2 and GLM-3). Specifically, two individuals made the exact same investment on all attack decisions, one individual made the exact same investment on all defense decisions, and one individual made the exact same investment during attack and defense. These participants had to be removed from the analysis. We tested for an interaction effect between role and each variable of interest by contrasting the relevant parameter estimates for attack and defense in a second-level within-participant fixed-effects analysis. Finally, we tested for group-level significance and corrected for multiple comparisons using FSL's FLAME 1 with the standard cluster forming threshold of Z > 3.1 and clusters significant at p = .05. We ran additional control analyses with FSL's randomized threshold-free cluster enhancement (Winkler, Ridgway, Webster, Smith, & Nichols, 2014; Smith & Nichols, 2009), and results were virtually identical.
We also conducted analyses within an a priori selected anatomical VS and within an a priori selected anatomical amygdala ROI. Both masks were obtained from the meta-analytic tool Neurosynth (Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011). We used the terms “ventral striatum” and “amygdala” in our search of Neurosynth, instead of using “reward” or “fear.” Avoiding psychological constructs such as reward or fear reduced possible bias in our ROIs in favor of a particular psychological construct. For our ROI analyses, we took the average value across every voxel within each ROI for each participant within the contrast of interest (e.g., attacker–reward prediction error) and then tested for significance with a paired-sample t test.
Earlier reports of the attacker–defender contest game analyzed investments in terms of the overall investment (range 0–10), the frequency of investment (all trials in which x or y > 0; range 0–60), and the force of investment (the amount invested on nonzero investment trials; range 1–10). For these measures, we find, consistent with earlier work, that individuals invested less often in attack than in defense, t(26) = −4.12, p = .0003; invested in attack less overall, t(26) = −8.56, p < .0001; and invested less forcefully in attack than in defense, t(26) = −7.81, p < .0001 (Figure 3B). Although individuals earned more from attack (noninvested resources + spoils of winning) than defense trials (noninvested resources in case of survival), t(26) = 43.91, p < .0001, they were less successful during attack than defense trials, t(26) = −7.22, p < .0001: As defender they “survived” more often than that they “killed” as attacker (Figure 3C).
In addition to the contrast between attack and defense, we examined investments in relation to predictions derived from standard economic theory that assumes rational self-interest and risk neutrality. Relative to mixed-strategy equilibrium predictions (see Methods), individuals invest more and more forcefully in defense (t(26) = 20.40, p < .0001, and t(26) = 18.467, p < .0001, respectively), but not more and not more forcefully in attack (t(26) = 1.46, p = .157, and t(26) = −0.78, p = .441, respectively; Figure 3A). Still, however, both attack and defense returned less earnings than predicted by standard economic theory (t(26) = −4.19, p = .00028, and t(26) = −40.56, p < .0001), and the frequency of both attacks and defense exceeded expectations based on rational selfish play (t(26) = 3.04, p = .0054, and t(26) = 30.26, p < .0001, respectively). Conversely, success rates for attacks (victories) and defense (survival) did not deviate from Nash equilibrium predictions (t(26) = −0.25, p = .804, and t(26) = −0.98, p = .336, respectively).
Neural Correlates of Attack and Defense
To examine the neural foundations of decision-making during attack and defense, we performed whole-brain analyses on the selection phase (when participants decided whether and how much to invest in attack or defense) and on the feedback phase (when participants received information about their opponent's investment and the resulting outcomes to oneself). Whereas no significant differences between attacker and defender were observed during selection, whole-brain analyses did show significant attacker–defender contrasts for the feedback phase. Specifically, during feedback, participants exhibited higher BOLD response during attack relative to defense in a cluster within the left anterior insula and inferior frontal gyrus (IFG; Figure 4: MNI coordinates: x = −40, y = 10, z = 16, Z = 4.88, cluster size = 1657, p = .0151, family-wise error [FWE]-whole brain).
In a follow-up analysis, we examined whether participants exhibited a correlation between neural activity and investments (during decision-making) and outcome (win/loss) during feedback. As before, no significant correlations were found between neural activity and investments during attack or defense, nor did the correlation differ between the two roles. During feedback, however, neural activity during attack covaried with wins and losses in clusters that included the bilateral VS, left OFC, left anterior insula, left TPJ, and lateral occipital cortex (Table 1 and Figure 4B). Activity in these same areas also correlated with wins/losses more during attack than defense but did not survive cluster-based multiple comparison correction (with p < .05, uncorrected). When participants processed feedback as defenders, there were no clusters that significantly covaried with wins and losses.
|Region .||Peak .||Cluster Size .||z Value .||p (FWE-corr) .|
|x .||y .||z .|
|Lateral occipital cortex||−22||−74||−8||1686||4.75||.002|
|TPJ/lateral occipital cortex||−26||−84||46||1577||4.1||.003|
|Region .||Peak .||Cluster Size .||z Value .||p (FWE-corr) .|
|x .||y .||z .|
|Lateral occipital cortex||−22||−74||−8||1686||4.75||.002|
|TPJ/lateral occipital cortex||−26||−84||46||1577||4.1||.003|
All statistics are corrected for multiple comparison with FSL's FLAME 1.
Model-based Analyses of Decision-making and Neural Activity
As noted in Methods section, we captured the computations at hand in attack and defense behavior using the cognitive hierarchies framework developed in behavioral economics (Botvinick et al., 2009; Camerer et al., 2004; Nagel, 1995). The idea is that players hierarchically form beliefs about their opponents' behavior, up to a certain level of cognitive sophistication (k-level; see Figure 2). We developed such computational models for hierarchies 1 up to 5 (see Methods section), and first verified that the behavior predicted by different levels of the cognitive hierarchies could be discriminated (see Methods/Model Identifiability section and Figure 5). We then fitted those models to our participants' investment data and ran a Bayesian model comparison to identify the hierarchy most likely to generate attacker- and defender-like behavior. Our results show that attackers are best described by a model with four levels of recursion (model K4, exceedance probability = 67.20%), whereas defenders are best described by a model with three levels of recursion (model K3, exceedance probability = 87.41%; Figure 5). From these models, we estimated, for each participant and each investment in attack and defense, the expected reward, risk prediction, and concomitant reward and risk prediction errors. These reward and risk prediction errors were then related to neural activity, using both whole-brain and ROI-based analyses.
Neural Correlates of Reward Prediction Errors
Within our VS ROI, there was a significant correlation between reward prediction errors and VS neural activity during attack, t(22) = 2.645, p = .0148, but not during defense, t(22) = −0.330, p = .745. Furthermore, this correlation between reward prediction errors and VS activity was stronger in attackers than in defenders, t(22) = 2.189, p = .0395 (see Figure 6A). Within our amygdala ROI, there was no significant correlation between neural activity and reward prediction errors during either attack, t(22) = 1.785, p = .088, or defense, t(22) = −1.507, p = .146, but there was a significant difference in correlations between the two roles, t(22) = 2.405, p = .025.
At the whole-brain level, we found a cluster in the right IFG that significantly correlated with reward prediction errors during attack (MNI coordinates: x = 48, y = 32, z = 12, Z = 4.55, cluster size = 681, p = .0391, FWE-whole brain; see Figure 6B). We note that this cluster is similar in location to regions found to covary with RTs, but in the present case the correlation between RT and reward prediction errors was not significant (r = −.0079, p = .654). Because all the contrasts reported were conducted at the feedback time-phase, with the selection time-phase as a covariate RT was at least partially captured by our GLM. Accordingly, because RT reward prediction error is nonsignificant here and RT is captured in the duration of the selection-phase decision-making, we can conclude that RT is not of relevance here.
There were no clusters at the whole-brain level that correlated with reward prediction errors during defense, nor were there any clusters that showed a significant difference in correlation between attacker and defender trials.
Neural Correlates of Risk Prediction Errors
We found that, within our VS ROI, there was no significant correlation between neural activity and risk prediction errors during either attack, t(22) = −1.622, p = .117, or defense, t(22) = 0.164, p = .871, nor was there a significant difference in correlations between the two roles, t(22) = −1.505, p = .145. The same was true in our amygdala ROI (attacker: t(22) = −0.588, p = .562; defender: t(22) = 0.363, p = .720; attacker vs. defender: t(22) = −0.647, p = .523) and at a the whole-brain level.
Competition requires that people expend resources to win from other contestants and to expend resources to prevent losing from other contestants. These two core motives operating during competition—coming out ahead versus not falling behind—were examined here in a simple attacker–defender contest in which opposing individuals simultaneously invested, out of a personal endowment, into exploitative attacks and protective defense. As shown by others already, we find here too that individuals invest less frequently and less intensely in economically “injuring others” than they invest in defending themselves against the threat of being economically injured (De Dreu & Gross, 2019, for a review). Computationally, we found that during attack individuals tend to utilize a higher level of cognitive recursion than during defense. We furthermore found attack behavior relative to defense behavior to be preferentially associated with neural regions associated with theory of mind and, within the VS, to be preferentially correlated with reward prediction errors.
What remained poorly understood is why and how people design their strategies of attack and defense. We argued that, in addition to reward maximization, investments in attack and defense may be driven by the desire to out-compete the protagonists as well as by the desire to minimize risk. We approached this issue with a computational framework modeling reward and risk prediction errors based on k-level reasoning in belief formation (Zhu et al., 2012; Camerer et al., 2004; Nagel, 1995). Our results at the neural level revealed no evidence for risk minimization. Instead and in line with earlier work (e.g., Zhu et al., 2012), we find good evidence that contestants aimed to maximize reward both during attack and defense. At the same time, however, we observed significant differences in the computation of expected reward and in the underlying neural activation during attack versus defense. Specifically, we found reward prediction errors during attack (more than during defense) to robustly correlate with neural activity in the VS and, using whole-brain analyses, the IFG.
Our computational modeling demonstrated that investments in attack are best fitted by a model containing four levels of recursion whereas investments in defense are best fitted by a model containing three levels of recursion. This suggests that individuals engage in more sophisticated reasoning about their protagonist's strategy during attack than defense. Indeed, our neuroimaging results revealed significant attack–defense contrasts in neural activation in regions often associated with perspective taking and “theory of mind”—the lateral occipital cortex, the IFG, and the TPJ (Engelmann, Schmid, De Dreu, Chumbley, & Fehr, 2019; Prochazkova et al., 2018; Van Overwalle, 2009). These results resonate with earlier work showing that temporarily dysregulating the IFG through theta burst stimulation affected investment behavior during attack but not defense (De Dreu, Kret, et al., 2016) and that reducing cognitive capacity before decision-making influenced attackers but not defenders (De Dreu et al., 2019). Combined, these results suggest that individuals engage neural regions for perspective taking and theory of mind during economic contests to out-smart and exploit their protagonist.
Results for neural activity were specific to the feedback phase, when contest outcomes were presented, and not observed during the selection phase, when investment decisions were implemented. Possibly, different neurocognitive operations govern implementation and processing of feedback. During implementation, controlled deliberation may be more or less active, and this may relate to activity in prefrontal regions involved in executive control. Perhaps, the extent to which cognitive control and deliberation during selection is engaged is not conditioned by the specific role decision-makers perform. During feedback, learning and updating operations may be active, and this may relate to neural activation in regions involved in value computation and emotion processing (Behrens, Hunt, & Rushworth, 2009; Yacubian et al., 2006). Indeed, we found neural activity in the VS to be meaningfully related to reward prediction errors (also see Stallen et al., 2018; Zhu et al., 2012; Yacubian et al., 2006; O'Doherty et al., 2004). In contrast to expectations, however, we did not find differential activity in the amygdala, nor amygdala activity to be related to behavioral indicators processed during feedback. Possibly, contestants process feedback in an emotionally detached and rather cognitive manner aimed at revising and updating their (future) strategy for attack and defense.
Our study design included male participants, and extrapolating conclusions to female participants may be nontrivial. Intuitively competitive success and reward maximization may fit an (evolved) male psychology, whereas risk minimization fits an (evolved) female psychology (Niederle & Vesterlund, 2011; Croson & Gneezy, 2009; Spreckelmeyer et al., 2009). At the same time, however, male and female participants tend to perform similarly in the attacker–defender contest studied here (De Dreu et al., 2019). Future work is needed to test whether the neurocognitive mechanisms are similar as well, which would further contradict the intuitive hypothesis derived from evolutionary psychology.
Competitions are part and parcel of human life and can be wasteful. In the current contest, participants destroyed roughly 40% of their wealth in attempts at “injuring others and protecting against being injured” (viz. Mill, 1859). Our neurocomputational approach suggested that injuring others is done through rather sophisticated cognitive reasoning, with the key aim to understand the protagonist's strategy selection such that personal rewards can be optimized. When investing in attack more than in defense, people engage more sophisticated cognitive recursion. Furthermore, neural structures associated with theory of mind and reward processing are recruited more during attack than defense decisions. Perhaps, mentalizing not only serves empathy and prosocial decision-making, but also the strategic goal of reward maximization through exploitation and subordination.
Financial support was provided by a seed grant from the Amsterdam Brain and Cognition Priority Area to C. K. W. D. D., F. V. W., and R. R., an SNF Ambizione grant (PZ00P3_174127) to M. L., and the Spinoza Award from the Netherlands Science Foundation (NWO SPI-57-242) to C. K. W. D. D. C. K. W. D. D., R. R., and F. V. W. conceived of the study and designed the behavioral experiment. H. S. S. implemented and coordinated neuroimaging. M. R.-G. and M. L. contributed the computational model and analyzed the data. M. R.-G. and C. K. W. D. D. wrote the article and incorporated coauthor revisions. The authors note that this study capitalizes on computational models that were conceived after data collection and that were not part of the original set of predictions and analysis plans.
Reprint requests should be sent to Michael Rojek-Giffin, Institute of Psychology, Leiden University, Wassenaarseweg 52, 2300 RB, Leiden, The Netherlands, or via e-mail: email@example.com, or Carsten K. W. De Dreu, Institute of Psychology, Leiden University, Wassenaarseweg 52, 2300 RB, Leiden, The Netherlands, or via e-mail: firstname.lastname@example.org.
The attack–defense contest belongs to a class of asymmetric conflict games in which one player competes to maximize personal gain and the counterpart competes to prevent exploitation (De Dreu & Gross, 2019; Dechenaux, Kovenock, & Sheremeta, 2015). Including in this class of asymmetric games are the hide-and-seek game (Bar-Hillel, 2015; Flood, 1972), the matching pennies game (Goeree, Holt, & Palfrey, 2003), the inspection game (Nosenzo, Offerman, Sefton, & van der Veen, 2014), and the best shot/weakest link game (Chowdhury & Topolyan, 2016; Clark & Konrad, 2007). Across these games, humans invest to maximize wealth and/or to minimize risk of losing.
The sample was the same as used in De Dreu et al. (2015), which used a crossover design to examine the behavioral and neural effects of oxytocin (vs. placebo) administration. Here, we only analyze investments made under placebo. Moreover, our earlier report only considered trials in which participant decisions affected themselves only and did not include those decision trials in which decisions also affected two other individuals within their group. Here, we include also those previously unanalyzed trials. Because this manipulation revealed no differences, we collapsed across these two conditions. In short, the current study shares 25% of its analyzed data with the previous one, asks a different research question, and uses distinctly different analytic techniques.
Specifically, the mixed-strategy equilibrium is computed as follows: Attack: p(x = 1) = 2/45, p(x) = p(x − 1)[(12 − x)/(10 − x)] for 2 ≤ x ≤ 6, p(x = 0) = 1 − [p(x = 1) + … + p(x = 6)] = 0.4, and p(x) = 0 for x ≥ 7; Defense: p(y) = 1/(10 − y) for 0 ≤ y ≤ 5, p(y = 6) = 1 − [p(y = 0) + … + p(y = 5)] = 0.15, and p(y) = 0 for y ≥ 7 (also see De Dreu et al., 2015).