Abstract

Intention recognition is ubiquitous in most social interactions among humans and other primates. Despite this, the role of intention recognition in the emergence of cooperative actions remains elusive. Resorting to the tools of evolutionary game theory, herein we describe a computational model showing how intention recognition coevolves with cooperation in populations of self-regarding individuals. By equipping some individuals with the capacity of assessing the intentions of others in the course of a prototypical dilemma of cooperation—the repeated prisoner's dilemma—we show how intention recognition is favored by natural selection, opening a window of opportunity for cooperation to thrive. We introduce a new strategy (IR) that is able to assign an intention to the actions of opponents, on the basis of an acquired corpus consisting of possible plans achieving that intention, as well as to then make decisions on the basis of such recognized intentions. The success of IR is grounded on the free exploitation of unconditional cooperators while remaining robust against unconditional defectors. In addition, we show how intention recognizers do indeed prevail against the best-known successful strategies of iterated dilemmas of cooperation, even in the presence of errors and reduction of fitness associated with a small cognitive cost for performing intention recognition.

1 Introduction

Intention recognition can be found abundantly in many kinds of interactions and communications, not only in humans but also in many other species [14, 40, 76, 87]. It is so critical for human social functioning and the development of key human abilities, such as language and culture, that it is reasonable to assume it has been shaped by natural selection [87]. It is defined, in general terms, as the process of becoming aware of the intention of another agent and, more technically, as the problem of inferring an agent's intention through its actions and their effects on the environment [12, 26, 34]. Knowledge about the intention of others in a situation may enable one to plan in advance, either to secure successful cooperation or to deal with potential hostile behaviors [3, 57, 82], which may lead to disastrous collective outcomes [41, 60]. Intention recognition, and the recognition process or heuristics in general, are important mechanisms used by bounded human rationality to cope with real-life complexity [3, 65, 68, 69]. In contrast to the assumption of perfect rationality, we usually have to make decisions in complicated and ill-defined situations, with incomplete information and time constraints [69, 75], under computational processing limitations, including limited memory capacity [31, 32, 69]. The recognition processes, based on stored knowledge in terms of generic patterns, have been known to play a major role in that respect. In problems with such complications, we look for patterns; using them, we simplify the problem by constructing temporary internal models as working hypotheses [3, 69]. This process becomes even more important when considering interactive settings where the achievement of a goal by an agent does not depend solely on its own actions, but also on the decisions and actions of others, especially when the possibility of communication is limited [26, 36, 83]. The agents cannot rely on others to behave under perfect or improved rationality, and therefore need to be able to recognize their behaviors and even predict the intention beyond the surface behaviors. Additionally, in more realistic settings where deception may offer additional profits, individuals often attempt to hide their real intentions and make others believe in bogus ones [22, 51, 56, 62, 71, 76].

Hence, undoubtedly, a capability of intention recognition would confer on its holder great evolutionary benefits. However, although there have been extensive studies of that capability in artificial intelligence, philosophy, and psychology for several decades [9, 10, 12, 20, 26, 34, 50], few have addressed the evolutionary roles and aspects of intention recognition with regard to the emergence of cooperation.

A crucial issue, traversing areas as diverse as biology, economics, artificial intelligence, political science, and psychology, is to explain the evolution of cooperation: how natural selection can lead to cooperative behavior [4, 23, 67]. A cooperator may be seen as someone who pays a cost for another individual to receive a benefit. Cost and benefit are measured in terms of success or reproductive fitness. If two individuals simultaneously or independently decide to cooperate or not, the best possible response will be to try to receive the benefit without paying the cost. So, why does natural selection equip individuals with altruistic tendencies? Various mechanisms responsible for promoting such behavior have been recently proposed [43, 67], including kin and group ties [77, 85], different forms of reciprocity [29, 45, 47, 48, 81], and networked populations [59, 61, 63, 72]. In contradistinction, here we address explicitly how cooperation may emerge from the interplay between population dynamics and individuals' cognitive abilities, namely the ability to perform intention recognition. As usual, our study is carried out within the framework of evolutionary game theory (EGT) [27, 67].

In this work we take a first step toward employing intention recognition within an evolutionary setting and repeated interactions. As in the traditional framework of direct reciprocity [81], intention recognition is performed using information about past direct interactions. As usual, the inputs of an intention recognition system are a set of conceivable intentions and a set of plans achieving each intention—given in terms of either a plan library [12, 20] or a plan corpus [2, 7, 21]. In this EGT context, conceivable intentions are the strategies already known to the intention recognizer, whose recognition model is learned from a plan corpus consisting of sequences of moves (called plan sessions) for the different strategies. There have been several successful corpus-based intention recognition models in the literature [2, 7, 8], and we adjust a preexisting one to the current work in lieu of supplying a completely novel one (see Section 2.5). The rationale of the corpus-based approach relies on the idea of nature-nurture coevolution or experience inheritance [54, 66]: The corpus represents ancestors' experience in interacting with known strategies. Additionally, intention recognizers can use themselves as a framework for learning and understanding those strategies by self-experimenting with them—as suggested by the well-known “like me” framework [39, 40]. This is often addressed in the context of the theory-of-mind theory, neurologically relying in part on mirror neurons, at several cortical levels, as supporting evidence [28, 42, 55]. Closely related to this second point, note that, contrary to the formal EGT framework being used here, there have been several works (e.g., [35, 64, 74, 89]) showing the evolution of mind that are based on genetic algorithm simulations. Indeed, intention recognition can be considered as an elementary component of theory-of-mind [52, 86].

In addition, to the best of our knowledge, although there is a large body of literature on learning in games [19, 37] (e.g., reinforcement learning [58, 83]), very little attention has been paid to studying how some cognitive ability that requires learning (such as the ability of intention recognition in this work) fares in an evolutionary setting, particularly within the EGT framework. On the other hand, there have been some efforts to study the effects of increased memory size in evolutionary settings (e.g., see [24]), though individual learning is not considered. In contrast to this literature, our aim is to provide a computational model showing how the cognitive ability of intention recognition, which is so critical and ubiquitous in humans' activities [40, 76, 87], is a viable possibility that might have been retained by natural selection and coevolves with the collective need to achieve high cooperative standards.

We offer a method to acquire an intention-based decision-making model from the plan corpus, stating what to play with a given co-player in view of the recognized intention and the game's current state. The intention-based decision maker attempts to achieve the greatest expected benefit for himself/herself, taking advantage of the knowledge about the co-player's intention. The model is discussed in Section 2.6.

In addition, we show that our intention recognizers prevail against the best-known successful strategies of repeated dilemmas of cooperation, including tit for tat, generous tit for tat, and win-stay, lose-shift (described in Section 2.3), even in the presence of noise. Finally, as higher cognitive skills may imply a certain cost, we also discuss the implications of adding a (biological) cost of performing intention recognition.

2 Materials and Methods

2.1 Interaction between Agents

Interactions are modeled as symmetric two-player games defined by the payoff matrix
formula
A player who chooses to cooperate (C) with someone who defects (D) receives the sucker's payoff S, whereas the defecting player gains the temptation to defect, T. Mutual cooperation (defection) yields the reward R (punishment P) for both players. Depending on the ordering of these four payoffs, different social dilemmas arise [37, 61, 67]. Namely, in this work we are concerned with the prisoner's dilemma (PD), characterized by T > R > P > S. In a single round, it is always best to defect, but cooperation may be rewarded if the game is repeated. In this case, it is additionally required that mutual cooperation is preferred over an equal probability of unilateral cooperation and defection (i.e., 2R > T + S); otherwise, alternating between cooperation and defection would lead to a higher payoff than mutual cooperation. For convenience and a clear representation of results, we later mostly use the donor game [67]—a famous special case of the PD—where T = b, R = bc, P = 0, and S = −c, satisfying b > c > 0, where b and c stand respectively for “benefit” and “cost” (of cooperation).

We do not maintain that the repeated PD is necessarily a good model for the evolution of human cooperation [6, 70], but we nevertheless want to demonstrate the effectiveness of the intention recognition strategy against such well-explored PD strategies as tit for tat and win-stay, lose-shift (described in Section 2.3). In addition, because the PD is the most difficult dilemma for cooperation to emerge from, we envisage that the intention recognition strategy would do well in other dilemmas such as the stag-hunt game [70].

In a population of N individuals interacting via a repeated (or iterated) PD, whenever two specific strategies are present in the population, say A and B, the (per-round) payoff or fitness of an individual with a strategy A in a population with kAs and NkBs can be written as
formula
where πA,A(j) (πA,B (j)) stands for the payoff obtained from a round j as a result of their mutual behavior of an A strategist in an interaction with an A (B) strategist (as specified by the payoff matrix above), and m is the total number of rounds of the PD. As usual, instead of considering a fixed number of rounds, upon completion of each round, there is a probability w that yet another round of the game will take place, resulting in an average number 〈m〉 = (1 − w)−1 of rounds per interaction [67].

2.2 Evolutionary Dynamics

The accumulated payoff from all interactions (defined in Equation 1) emulates the individual fitness or social success, and the most successful individuals will tend to be imitated by others, implementing a simple form of social learning [67]. A strategy update event is defined in the following way, corresponding to the so-called pairwise comparison [73, 78]. At each time step, one individual i with a fitness fi is randomly chosen for behavioral revision. The individual i will adopt the strategy of a randomly chosen individual j with fitness fj with a probability given by the Fermi function (from statistical physics)
formula
where the quantity β, which in physics corresponds to an inverse temperature, controls the intensity of selection. When β = 0 we obtain the limit of neutral drift, and with the increasing of β one strengthens the role played by the game payoff in the individual fitness and in behavioral evolution [78, 79].

In the absence of mutations, the end states of evolution are inevitably monomorphic, as a result of the stochastic nature of the evolutionary dynamics and the update rule. As we are interested in a global analysis of the population dynamics with multiple strategies, we further assume that, with a small probability μ, individuals switch to a randomly chosen strategy, freely exploring the space of possible behaviors. On introducing a small probability of mutation or exploration (the eventual appearance of a single mutant in a monomorphic population), this mutant will fixate or will become extinct long before the occurrence of another mutation, and for this reason the population will spend all of its time with a maximum of two strategies present simultaneously [18, 25, 29, 62, 84]. This allows one to describe the evolutionary dynamics of our population in terms of a reduced Markov chain of size equal to the number of different strategies, where each state represents a possible monomorphic end state of the population associated with a given strategy, and the transitions between states are defined by the fixation probabilities of a single mutant of one strategy in a population of individuals who adopt another strategy. The resulting stationary distribution characterizes the average time the population spends in each of these monomorphic states, and can be computed analytically [18, 25, 29, 33, 62, 84] (see below).

In the presence of two strategies the payoffs of each are given by Equation 1, whereas the probability of changing the number k of individuals with a strategy A (by ±1 in each time step) in a population of NkB-strategists is
formula
The fixation probability of a single mutant with a strategy A in a population of N − 1 Bs is given by [78]
formula
where λj = T (j)/T+ (j).

In the limit of neutral selection (i.e., β = 0), λj = 1. Thus, ρB,A = 1/N. Considering a set {1,…, nS} of different strategies, these fixation probabilities determine the transition matrix [Tij]i,j = 1,…, nS, with Tii = 1 − ∑k=1,kinS ρk,i/(nS − 1) and Tij,ji = ρji/(nS − 1), of a Markov chain. The normalized eigenvector associated with the eigenvalue 1 of the transpose of M provides the stationary distribution described above [33], describing the relative time the population spends adopting each of the strategies.

2.3 Strategies in the Repeated Prisoner's Dilemma

The repeated PD is usually known as a story of tit for tat (TFT), which won both Axelrod's tournaments [4, 5]. TFT starts by cooperating, and then does whatever the opponent did in the previous round. It will cooperate if the opponent cooperated, and will defect if the opponent defected. But if there are erroneous moves because of noise (i.e., an intended move is wrongly performed with a given execution error probabililty, referred to here as noise), the performance of TFT declines, in two ways: (i) it cannot correct errors (e.g., when two TFTs are playing with one another, an erroneous defection by one player leads to a sequence of unilateral cooperation and defection) and (ii) a population of TFT players is undermined by random drift when AllC (always cooperate) mutants appear (which allows exploiters to grow). TFT is then replaced by generous tit for tat (GTFT), a strategy that cooperates if the opponent cooperated in the previous round, but sometimes cooperates even if the opponent defected (with a fixed probability p > 0) [45]. GTFT can correct mistakes, but remains subject to random drift.

Subsequently, TFT and GTFT were replaced by win-stay, lose-shift (WSLS) as the winning strategy chosen by evolution [46]. WSLS starts by cooperating, and repeats the previous move whenever it did well, but changes otherwise. WSLS corrects mistakes better than GTFT and does not suffer random drift. However, it is severely exploited by AllD (always defect) players.

2.4 Plan Corpus Description

In the following, we describe how to create plan corpora for training and testing the described intention recognition models, for a given set of strategies. For the sake of generality, we start by making an assumption that all strategies to be recognized have memory size bounded above by M (M ≥ 0)—that is, their decision at the current round is independent of the past rounds that are at a time distance greater than M. To deal with such strategies, the corpus-based intention recognition strategy to be described is also equipped with a memory size of M, but has no need for more. Note that the strategies described above (TFT, GTFT, WSLS) have memory bounded by M = 1; hence to cope with them, the intention recognizer memorizes a single last interaction with them.

For clarity of representation, abusing notation, R, S, T, and P are henceforth also referred to as (elementary) game states, in a single round of interaction. Additionally, E (standing for empty) is used to refer to a game state having had no interaction. The most basic elements in a plan corpus are the corpus actions, having the following representation.

Definition 2.1 (Corpus action): An action in a plan corpus is of the form s1sMξ, where si ∈ {E, R, T, S, P}, 1 ≤ iM, are the states of the M last interactions, and ξ ∈ {C, D} is the current move.1

Definition 2.2 (Plan session): A plan session of a strategy is a sequence of corpus actions played by that strategy (more precisely, a player using that strategy) against an arbitrary player.

We denote by ∑M the set of all possible types of action, so that |∑M| = 2 × 5M. For example, one can have
formula
This way of encoding actions and the assumption about the players' memory size lead to the equivalent assumption that the action in the current round is independent of the ones in previous rounds, regardless of the memory size. The independence of actions will allow us to derive a convenient and efficient intention recognition model, discussed in Section 2.5. Furthermore, it enables us to save the game states without having to save the co-player's moves, thus simplifying the representation of plan corpora.
As an example, let us consider TFT and the following sequence of its interactions with some other player (denoted by X), in the presence of noise:
formula
The corresponding plan session for TFT is [EC, RC, SD, PD, TD]. At the 0th round, there is no interaction; thus the game state is E. TFT starts by cooperating (1st round); hence the first action of the plan session is EC. Since player X also cooperates in the 1st round, the game state at this round is R. TFT reciprocates in the 2nd round by cooperating; hence the second action of the plan session is RC. Similarly for the third and the fourth actions. Now, at the 5th round, TFT should cooperate, since X cooperated in the 4th round, but because of noise, it makes an error and defects. Therefore, the 5th action is TD.

Definition 2.3 (Plan corpus): Letbe a set of strategies to be recognized. A plan corpus foris a set of plan sessions generated for each strategy in the set.

For a given set of strategies, different plan corpora can be generated for different purposes. In Section 3.1, for example, we generate plan corpora for training and testing intention recognition models.

2.5 Corpus-Based Intention Recognition Model

We can use any corpus-based intention recognition model in the literature for this work. The most successful works are described in [2, 7, 8]. In [7], the authors use the bigram statistical model, making the assumption that the current action only depends on the previous one. In [2], the authors attempt to avoid this assumption by using the variable-order Markov model. In our work, because of the independence of (corpus) actions, we can derive an even simpler model than that in [7], as described below.

Let Ii, 1 ≤ in, be the intentions to be recognized, and O = A1Am the current sequence of observed actions. The intention recognition task is to find the most likely intention I ∈ {I1,…, In} given the current sequence of observed actions A1Am, that is,
formula
The second equation is obtained by applying the Bayes and then the chain rule [7, 49]. Since the denominator P(A1Am) is a positive constant, we can ignore it. Then, because of the independence among actions for the strategies being recognized, we obtain
formula
Note that this simplified expression is derived independently of the memory size M. All the probabilities needed for this computation are to be extracted beforehand using a training plan corpus. There is no update of these probabilities during intention recognizers' life cycle. These constants are arguably obtainable by evolutionary means via lineage [54, 66]—the corpus represents ancestors' experience in interacting with known strategies, or the “like me” self-experimenting framework [39, 40]. They are learned once, and are used only for the purpose of performing intention recognition.

Also note that if two intentions are assessed with the same probability, the model predicts the one with higher priority. Priorities of intentions are set depending on the behavioral attitude of the intention recognizer. For example, in Figure 2, discussed in Section 3.3, if IR's co-player cooperates in the first round, the co-player's intention can be predicted as either AllC, WSLS, or TFT (since they are assigned the same conditional probability values). There being concern about TFT's and WSLS's retaliation after a defect (i.e., IR's behavior attitude), WSLS and TFT should have higher priorities than AllC.

2.6 Intention-Based Decision-Making Model

We describe how to acquire a decision-making model for an intention recognizer from a training plan corpus. The intention recognizer chooses to play what would provide it the greatest expected payoff against the recognized strategy (intention). Namely, from training data we need to extract the function θ(s, I):
formula
deciding what to play (C or D) given the sequence of the M last game states, s = s1sM (si ∈ {E, R, T, S, P} with 1 ≤ iM) and the recognized intention I ∈ {I1,…, In}. This means that the intention recognizer needs to memorize the state of at most M last interactions with its co-player, besides the fixed set of probabilistic constants in Equation 3, specifically for the purpose of intention recognition. That is done as follows. From the plan sessions in the training corpus for each intention we compute the (per-round) average payoff the intention recognizer would receive with respect to each choice (C or D), for each possible sequence of states s. The choice giving the greater payoff is chosen. Formally, let DS(I) be the set of all sequences of actions (plan sessions), Sq = A1Ak (Ai ∈ ∑M, 1 ≤ ik), for intention I in the corpus, and π(Sq, j) the payoff the intention recognizer would get at round j. In the following, if the sequence (plan session) in which the payoff being computed is clear from the context, we ignore it and simply write π(j). Thus,
formula
where
formula
and
formula
A more general version of this decision-making model is provided by considering a discount factor, say 1/α with 0 < α ≤ 1, stating by how much the payoffs of distant rounds are less important:
formula
where
formula
and
formula
The first model is a special case of the second one where α = 1. This is the most future-optimistic case. If α is small enough (α ≈ 0), then θ(s, I) = D: In a given round of the PD, it is always best to defect. This is the future-pessimistic case.

Note that, at the first round, there is no information about the co-player. The intention recognizer cooperates, that is, θ(EM, I) = CI ∈ {I1,…, In}.

3 Model Acquisition and Evaluation

3.1 Plan Corpus Generation

With the corpus description provided in Section 2.4, let us start by generating a plan corpus of four of the best-known strategies within the framework of repeated games of cooperation: AllC (always cooperate), AllD (always defect), TFT, and WSLS (see above). Not only do these strategies constitute the most-used corpus in this context, but most other strategies can be seen as a high-level composition of the principles embodied in these strategies. Hence, intention recognizers map their opponent's behaviors to the closest strategy that they know, and interact accordingly. When their knowledge is extended to incorporate new strategies, the models can be revised on the fly. However, this possibility is beyond the scope of this article.

Let us further note that here we are not trying to design any optimal strategy for the repeated PD. Indeed, our aim is solely to provide a minimal model that supports the idea that intention recognition may prevail and evolve in the presence of cooperation dilemmas. While intention recognition may have played an important evolutionary role [39, 76, 87], it is difficult to conceive of it as the key cognitive skill toward cooperation. Nevertheless, we could easily include other, more complex strategies, such those with greater memory sizes and/or learning capability [30, 38], and let the intention recognizers learn to recognize them in a similar manner, by equipping the recognizers with memory capacity concomitant with their co-players'. As repeated games provide endless possibilities for the cognitive principles behind each particular strategy (as can be easily understood), it was not our goal to provide an exhaustive study on all variants of the strategies analyzed here.

We collect plan sessions of each strategy by playing a random move (C or D) in each round with it. To be more thorough, we can also play all possible combinations for each given number of rounds, m. For example, if m = 10, there will be 1024 = 210 combinations—C or D in each round. Moreover, since interactions in the presence of noise are taken into consideration, each combination is played several times.

The training corpus to be used here is generated by playing with each strategy all the possible combinations 20 times, for each number of rounds, m, from 5 to 10. The testing data set is generated by playing a random move with each strategy in each round, also for m from 5 to 10. We continue until we obtain the same number of plan sessions as for the training data set (corpus). Both data sets are generated in the presence of noise (namely, an intended move is performed wrongly with probability 0.05).

3.2 Intention Recognition Model

3.2.1 Evaluation Metrics

For evaluating the intention recognition model, we use three different metrics. Precision and recall report the number of correct predictions divided by the numbers of total predictions and total prediction opportunities, respectively. If the intention recognizer always makes a prediction (whenever it has the opportunity), recall is equal to precision. Convergence is a metric that indicates how much time the recognizer took to converge on what the current user goal/intention was. Formal definitions of the metrics can be found in [2].

Results

3.2.2 The intention recognition model is acquired using the training corpus. Table 1 shows the recognition results of the model for the testing data set, using the three metrics described above. We show the recognition result for each strategy and for the whole data set. Given that the training as well as the testing data sets are generated in the presence of noise, the achieved intention recognition performance is quite good. In the next section, we study the performance of players using this intention recognition model together with the intention-based decision-making model (IR players) in large-scale population settings—particularly to address “What is the role of intention recognition for the emergence of cooperation?”

Table 1

Intention recognition results for each strategy and the total.

StrategyAllCAllDTFTWSLSTotal
Precision0.8590.9990.8180.5790.824
Recall 0.859 0.999 0.818 0.575 0.824 
Converg. 0.859 0.999 0.719 0.579 0.805 
StrategyAllCAllDTFTWSLSTotal
Precision0.8590.9990.8180.5790.824
Recall 0.859 0.999 0.818 0.575 0.824 
Converg. 0.859 0.999 0.719 0.579 0.805 

3.3 Decision-Making Model

The decision-making model (in Section 2.6) is acquired using the training corpus (Figure 1). We henceforth use α = 1; that is, independently of the PD payoff matrices (used in this article), if at the current round the co-player's recognized intention is an unconditional one (AllC or AllD), IR always defects, regardless of the current game states; if it is TFT, IR always cooperates; and if it is WSLS, IR cooperates if and only if the current state is either R or P.

Figure 1. 

Decision-making model for different values of α. If the recognized intention is AllC or AllD, intention recognizers (IR) always defect, regardless of the current states. If it is TFT, IR cooperates when α is large enough, regardless of the current states. If it is WSLS, then if the current states are S or T, IR always defects; otherwise, IR cooperates for large enough α. This model is acquired for a PD with R = 1, S = −1, T = 2, P = 0. The model has the same behavior for all PD payoff matrices used in this article.

Figure 1. 

Decision-making model for different values of α. If the recognized intention is AllC or AllD, intention recognizers (IR) always defect, regardless of the current states. If it is TFT, IR cooperates when α is large enough, regardless of the current states. If it is WSLS, then if the current states are S or T, IR always defects; otherwise, IR cooperates for large enough α. This model is acquired for a PD with R = 1, S = −1, T = 2, P = 0. The model has the same behavior for all PD payoff matrices used in this article.

From here it is clear that, similar to WSLS and TFT, at each round, the strategy IR grounds its decision solely on the last move of others (M = 1). The remaining memory capacity required for the functioning of IR is used for the purpose of identifying its co-player's intention. That is to say, if IR is provided with its co-player's intention, whether by some other, possibly cheaper intention recognition methods or by heuristics, it only needs to remember the last interaction. That means the success of our IR strategy, described below, is exclusively governed by the effect of taking into account its co-player's intention in choosing a move, not its larger memory capacity, used once to identify that intention.2 In this respect, we also note that the important role of intention-based decision making has been recognized in a diversity of experimental studies in behavioral economics, for example, in [16, 17, 53].

Figure 2 shows how an IR using the above acquired intention recognition and intention-based decision-making model interacts with other strategies, including AllC, AllD, TFT, WSLS, and another IR, in the absence of noise. Except with AllD, IR plays C in the first two rounds with other strategies: IR always plays C in the first round, and since others also play C (thus, the action is EC), they are predicted to be TFT (since P(EC|AllC) = P(EC|TFT) = P(EC|WSLS) ≫ P(EC|AllD))—therefore, IR plays C in the second round. Note that here TFT is set with a higher priority than WSLS, which in turn has a higher priority than AllC. In the third round, these strategies are all predicted to be AllC, since they play C in the second round (and since P(RC|AllC) > P(RC|WSLS) > P(RC|TFT)). Hence, IR plays D in this round. The moves of these strategies (the other IR plays D, others play C) classifies IR as WSLS, and the other three remain AllC, since P(RD|WSLS) > P(RD|TFT) ≫ P(RD|AllC). The two inequalities P(RC|WSLS) > P(RC|TFT) and P (RD|WSLS) > P(RD|TFT), for a big enough training corpus, are easily seen to hold: Although TFT and WSLS equally likely play C and D, respectively, after R, since WSLS corrects mistakes better than TFT, mutual cooperations are more frequent in plan sessions for WSLS in the training corpus. The reaction in the fourth round classifies TFT as TFT, IR and WSLS as WSLS, and AllC as AllC; and likewise in the subsequent rounds. From the fifth round on, IR cooperates with WSLS, TFT, and another IR. If the number of rounds to be played is very large, then up to some late round, these three strategies will be recognized as AllC again (since P(RC|AllC) > P(RC|WSLS) > P(RC|TFT)); then the process repeats from the third round. In our corpus, that only happens after more than 100 rounds. In playing with an AllD, IR cooperates in the first round and defects in the remaining rounds, since P(ED|AllD) ≫ P(ED|I) for all I ∈ {AllC, TFT, WSLS} and furthermore P(s|AllD) ≫ P(s|I) for all I ∈ {AllC, TFT, WSLS} and s ∈ {RD, SD, TD, PD}.

Figure 2. 

Interactions of IR with AllC, AllD, TFT, WSLS, and another IR, in the absence of noise and for α = 1.

Figure 2. 

Interactions of IR with AllC, AllD, TFT, WSLS, and another IR, in the absence of noise and for α = 1.

4 Experiments and Results

In the following we provide analytical results under different evolutionary dynamics as well as using computer-based simulations. We show that the introduction of intention recognition promotes the emergence of cooperation in various settings, even in the presence of noise.

4.1 Analysis

To begin with, let us consider a population of AllC, AllD, and IR players. They play the repeated PD. Suppose m (m < 100) is the average number of rounds. In the absence of noise, the payoff matrix of AllC, AllD, and IR in m rounds is given by (Figure 2, α = 1)
formula
In each round, AllC cooperates. Thus, its co-player will obtain a reward R if it cooperates and a temptation payoff T otherwise. Hence, in playing with AllC (first column of the matrix), another AllC obtains m times R, since it cooperates in each round; AllD obtains m times T, since it defects in each round; and IR obtains 2 times R and m − 2 times T, since it cooperates with AllC in the first two rounds and defects in the remaining rounds (Figure 2). Other elements of the matrix are computed similarly.
Pairwise comparisons [27, 43, 67] of the three strategies lead to the conclusions that AllC is dominated by IR and that IR is an evolutionarily stable strategy [27] if
formula
which always holds for m ≥ 3 (since 2R > T + P and R > P). An evolutionarily stable strategy is a strategy that, if adopted by a population of players, cannot be invaded by any alternative strategy that is initially rare [27]. This condition guarantees that once IR dominates the population, it becomes stable (for m ≥ 3).

Furthermore, one can show that

  1. IR is risk-dominant [27, 67] against AllD if R(m − 1) + S > P(m − 1) + T, which is equivalent to
    formula
    For the donor game, this is equivalent to m > 2b/(bc).
  2. IR is advantageous [27, 67] against AllD if R(m − 1) + 2S > T + Pm, which is equivalent to
    formula
    For the donor game, this is equivalent to m > (2b + c)/(bc).

Since IR and AllD are both evolutionarily stable strategies, Equation 7 provides the condition for which IR has the greater basin of attraction [43, 67]. Equation 8 provides the condition for which natural selection favors an IR to replace a population of AllDs, that is, IR has a fixation probability greater than the neutral one (1/N) [43, 67].

4.2 Evolutionary Simulations

In the presence of noise, it is not easy to provide an exact mathematical analysis. Instead, we study this case using computer simulations. For convenience and a clear representation of simulation results, we perform our simulations using the donor game [67], that is, T = b, R = bc, P = 0, S = −c, satisfying b > c > 0.

We start with a well-mixed population of size N, with individuals using different strategies. In each round of a generation, each individual interacts with all others, engaging in a PD game. The payoffs are accumulated over all the rounds. After each generation, an individual is randomly selected from the population and will adopt the strategy of another randomly selected individual, using the pairwise comparison rule [73, 78] described in Section 2.2.

The results for some different settings are shown in Figure 3. Our results show that IR always prevails against other strategies, including TFT, WSLS, and GTFT, for different benefit-to-cost ratios b/c, as well as being more robust to noise. Namely, it has a strictly larger range of the benefit-to-cost ratio where cooperation can emerge, and can maintain it under a higher level of noise.

Figure 3. 

Simulation results for donor game. In panels (a) and (b), we consider populations of three strategies: AllC, AllD, and either IR, TFT, or WLSL—equally distributed at the beginning. We plot the final fractions of IR, TFT, and WSLS. All simulations end up in a homogeneous state (i.e., having only one type of strategy) in less than 5,000 generations. Our results show that IR prevails over TFT and WSLS for different benefit-to-cost ratios b/c (panel (a)) and for different levels of noise (panel (b)). For a small ratio b/c (around 1.2), IR starts having an opportunity to win, and from around 1.4 the population always converges to the homogeneous state of IR. For TFT, these two latter values are 1.4 and 2.1, respectively. WSLS has no chance to win for b/c ≤ 2.4. The dashed black curve in (a) shows that the fraction of cooperation in the populations of AllC, AllD, and IR is monotonic in b/c. In (b), our result shows that, in the presence of noise, IR outperforms TFT and WSLS. This result is robust to changes in the value of b/c (see the inset of panel (b)). In panels (c) and (d), we consider a more complex setting where the population consists of several types of strategies: AllC, AllD, TFT, WSLS, and GTFT (where the probability of forgiving a defect is 0.5), with IR (panel (c)) or without IR (panel (d)). Other than the defective AllD and IR, the strategies are cooperative. Thus, instead of initially being equally distributed, we include a higher fraction of AllDs in the initial population. Namely, each type has 40 individuals, and AllD has 80. IR always wins (panel (c)). However, if the IR individuals are removed, AllD is the winner (panel (d)), showing how IRs work as a catalyst for cooperation. We have tested and obtained similar results for larger population sizes. Finally, in (a) and (b) we show how WSLS performs badly, as WSLS needs TFTs as a catalyst to perform well [67]—which can be observed in panels (c) and (d). All results were obtained averaging over 100 runs, for m = 10, N = 100, and β = 0.1.

Figure 3. 

Simulation results for donor game. In panels (a) and (b), we consider populations of three strategies: AllC, AllD, and either IR, TFT, or WLSL—equally distributed at the beginning. We plot the final fractions of IR, TFT, and WSLS. All simulations end up in a homogeneous state (i.e., having only one type of strategy) in less than 5,000 generations. Our results show that IR prevails over TFT and WSLS for different benefit-to-cost ratios b/c (panel (a)) and for different levels of noise (panel (b)). For a small ratio b/c (around 1.2), IR starts having an opportunity to win, and from around 1.4 the population always converges to the homogeneous state of IR. For TFT, these two latter values are 1.4 and 2.1, respectively. WSLS has no chance to win for b/c ≤ 2.4. The dashed black curve in (a) shows that the fraction of cooperation in the populations of AllC, AllD, and IR is monotonic in b/c. In (b), our result shows that, in the presence of noise, IR outperforms TFT and WSLS. This result is robust to changes in the value of b/c (see the inset of panel (b)). In panels (c) and (d), we consider a more complex setting where the population consists of several types of strategies: AllC, AllD, TFT, WSLS, and GTFT (where the probability of forgiving a defect is 0.5), with IR (panel (c)) or without IR (panel (d)). Other than the defective AllD and IR, the strategies are cooperative. Thus, instead of initially being equally distributed, we include a higher fraction of AllDs in the initial population. Namely, each type has 40 individuals, and AllD has 80. IR always wins (panel (c)). However, if the IR individuals are removed, AllD is the winner (panel (d)), showing how IRs work as a catalyst for cooperation. We have tested and obtained similar results for larger population sizes. Finally, in (a) and (b) we show how WSLS performs badly, as WSLS needs TFTs as a catalyst to perform well [67]—which can be observed in panels (c) and (d). All results were obtained averaging over 100 runs, for m = 10, N = 100, and β = 0.1.

4.3 Intensity of Selection

We consider a setting where five strategies AllC, AllD, TFT, WSLS, and IR are present in the population. We compute numerically stationary distributions for variable intensity of selection β (Figure 4), using the method described in Section 2.2. The results show that, for small enough noise, the population always spends more time in the homogeneous state of IR, especially for strong intensities of selection (Figure 4a). When noise is large, WSLS wins for strong intensities of selection, but IR still wins for the weak ones (Figure 4b).

Figure 4. 

Stationary distribution (in percent) of each strategy, depending on the intensity of selection, β. The population consists of five strategies: AllC, AllD, TFT, WSLS, and IR. For small values of β, selection is nearly neutral. Strategy updating is mostly random, and frequencies of all strategies are roughly equal. Discrimination between strategies occurs when β increases. (a) When noise is small, IR always wins, (b) When noise is large, IR wins for small β; WSLS wins when β is large. All calculations are made with b/c = 3, m = 10, N = 100. When noise is present, the average payoffs of each strategy are obtained by averaging 107 runs.

Figure 4. 

Stationary distribution (in percent) of each strategy, depending on the intensity of selection, β. The population consists of five strategies: AllC, AllD, TFT, WSLS, and IR. For small values of β, selection is nearly neutral. Strategy updating is mostly random, and frequencies of all strategies are roughly equal. Discrimination between strategies occurs when β increases. (a) When noise is small, IR always wins, (b) When noise is large, IR wins for small β; WSLS wins when β is large. All calculations are made with b/c = 3, m = 10, N = 100. When noise is present, the average payoffs of each strategy are obtained by averaging 107 runs.

Note that the case in which the intensity of selection β is very small is commonly referred to in the literature as weak selection [44, 78, 88]. Weak selection describes the situation in which the returns from the game represent a small perturbation to the fitness of an individual [1, 11, 80].

4.4 Cognitive Cost of Intention Recognition

We now extend our model to take into account the (cognitive) cost needed to perform intention recognition. Let us denote this cost of cognition by ς. In each interaction, it is subtracted from the payoff of IRs. Given the difficulty of assessing the costs associated with a given cognitive skill—see, for example, [13, 30, 38], where cognition costs were not considered as significant on the time scale of strategy evolution—we are not able to provide a particular value, even in relative terms. For this reason, we provide analysis on the range of ς under which IRs still prevail, identifying the values of ς above which IR will, as expected, be washed out from the population.

In a population of three strategies, AllC, AllD, and IR, and in the absence of noise, we derive some analytical results for ς under which IR is favored by natural selection in different settings. First, we can rewrite the payoff matrix as follows:
formula
Similar to the analysis in Section 4.1, pairwise comparisons of the three strategies lead to the conclusion that IR is an evolutionarily stable strategy if
formula
This inequality holds if m ≥ 3 (guaranteeing that the right-hand side is positive) and the cognitive cost ς is small enough. These conditions also guarantee that AllC is dominated by IR.
One can show that IR is risk-dominant [27, 67] against AllD if
formula
Furthermore, IR is advantageous [27, 67] against AllD if
formula
Equation 10 provides the condition on ς for which IR has the greater basin of attraction [43, 67]. Furthermore, Equation 11 provides the condition on ς for which natural selection favors an IR to replace a population of AllDs, that is, IR has a fixation probability greater than the neutral one (1/N) [43, 67].

Now let us consider the population where all five strategies AllC, AllD, TFT, WSLS, and IR are present in the population. Our simulation result (Figure 5) shows that if the cost to perform intention recognition is small enough, IR prevails: The population spends most of the time in the homogeneous state IR. However, as ς increases, as expected, the advantage of IR is undermined by this external cost, letting however the conditions for cooperation prevail in the long run, as IR is replaced by WSLS.

Figure 5. 

Stationary distribution (in percent) of each strategy, depending on the cognition cost ς. The population consists of five strategies AllC, AllD, TFT, WSLS, and IR. For small enough values of ς, the population spends most of the time in the homogenous state of IR. When ς is large, WSLS prevails. All calculations are made with b = 3, c = 1, m = 10, N = 100, ϵ = 0.01. When noise is present, the average payoffs of each strategy are obtained by averaging 107 runs.

Figure 5. 

Stationary distribution (in percent) of each strategy, depending on the cognition cost ς. The population consists of five strategies AllC, AllD, TFT, WSLS, and IR. For small enough values of ς, the population spends most of the time in the homogenous state of IR. When ς is large, WSLS prevails. All calculations are made with b = 3, c = 1, m = 10, N = 100, ϵ = 0.01. When noise is present, the average payoffs of each strategy are obtained by averaging 107 runs.

5 Concluding Remarks

Using the tool of evolutionary game theory, we have shown, analytically as well as by extensive computer simulations, that intention recognition may coevolve with cooperation. From this coevolution, cooperation prevails in the long run. Given the broad spectrum of problems that are addressed using this cooperative metaphor, our results indicate how intention recognition can be pivotal in social dynamics. Individuals who are equipped with an ability to recognize intentions of others (i.e., intention recognizers) can quickly recognize the unconditional defectors (AllDs), thus not being exploited by them as the WSLS strategy is. For their own benefit, the intention recognizers can recognize and exploit the unconditional cooperators AllCs, and thus do not suffer random drift as TFT and GTFT do. Furthermore, the intention recognizers are cooperative with the conditional cooperators, including TFT, WSLS, and players like themselves.

We have shown that a population with some initial fraction of intention recognizers acting selfishly to achieve the greatest benefit can lead to a stable cooperation where intention recognizers come to prevail upon and permeate the population. The intention recognition strategy has a greater range of benefit-to-cost ratios leading to cooperation than do the most successful existing strategies, including TFT and WSLS. We have also shown that it is more robust to noise, and it prevails under a wide range of intensities of selection. Furthermore, all mentioned results are robust, assuming that the cost of performing intention recognition is small enough compared to the cost of attempting to cooperate. This outcome is only undermined in the presence of external factors, represented here by a cost associated with higher cognitive skills.

In addition, our approach of using a plan corpus makes a case for other artificial intelligence techniques to work with the problem of cooperation. In this work, we studied the role of intention recognition for the emergence of cooperation, but other cognitive abilities, such as pattern recognition, are also of great interest and importance. Classification algorithms (and supervised learning in general) are clearly a good candidate. Indeed, intention recognition can be considered as a classification problem: The sequence of observed actions is classified into a known strategy. Clustering algorithms (and unsupervised learning in general) can be used to categorize the sequences of actions that do not fit with the known strategies. This is a way to learn about unknown strategies, categorize them, and revise the model to take them into account (and pass the revised model to the successors). Bridging such sophisticated techniques with human evolution and behavior remains an open challenge, both for artificial intelligence and for theoretical biology. Following an already long tradition in artificial life research, our study provides a step further in that direction.

Acknowledgments

T.A.H. and F.C.S. acknowledge support from FCT-Portugal (grant SFRH/BD/62373/2009 and R&D project PTDC/FIS/101248/2008, respectively).

Notes

1 

From now on, this notion of an action is used, which is different from the notion of a move (C or D).

2 

This should well be the case, because the link between memory size and the most abstract aspects of intelligence must be an indirect one: Conceptual thinking can hardly be reduced to simple memory storage and retrieval [15]. The development (and evolution) of complex cognitive abilities require long-term memory capacity, but this is obviously not the whole story.

References

1. 
Akashi
,
H.
(
1995
).
Inferring weak selection from patterns of polymorphism and divergence at “silent” sites in Drosophila DNA.
Genetics
,
139
(2)
,
1067
1076
.
2. 
Armentano
,
M. G.
, &
Amandi
,
A.
(
2009
).
Goal recognition with variable-order Markov models.
In
Proceedings of the 21st International Joint Conference on Artificial Intelligence
(pp.
1635
1640
).
Morgan Kaufmann
.
3. 
Arthur
,
W. B.
(
1994
).
Inductive reasoning and bounded rationality.
American Economic Review
,
84
(2)
,
406
411
.
4. 
Axelrod
,
R.
(
1984
).
The evolution of cooperation.
New York
:
Basic Books
.
5. 
Axelrod
,
R.
, &
Hamilton
,
W. D.
(
1981
).
The evolution of cooperation.
Science
,
211
,
1390
1396
.
6. 
Binmore
,
K. G.
(
2005
).
Natural justice.
Oxford, UK
:
Oxford University Press
.
7. 
Blaylock
,
N.
, &
Allen
,
J.
(
2003
).
Corpus-based, statistical goal recognition.
In
Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03)
(pp.
1303
1308
).
8. 
Blaylock
,
N.
, &
Allen
,
J.
(
2004
).
Statistical goal parameter recognition.
In S. Zilberstein, J. Koehler, & S. Koenig (Eds.)
,
Proceedings of the 14th International Conference on Automated Planning and Scheduling (ICAPS04)
(pp.
297
304
).
9. 
Bratman
,
M. E.
(
1987
).
Intention, plans, and practical reason.
Stanford, CA
:
Center for the Study of Language and Information
.
10. 
Bratman
,
M. E.
(
1999
).
Faces of intention: Selected essays on intention and agency.
Cambridge, UK
:
Cambridge University Press
.
11. 
Charlesworth
,
J.
, &
Eyre-Walker
,
A.
(
2007
).
The other side of the nearly neutral theory, evidence of slightly advantageous back-mutations.
Proceedings of the National Academy of Sciences of the U.S.A.
,
104
(43)
,
16992
16997
.
12. 
Charniak
,
E.
, &
Goldman
,
R. P.
(
1993
).
A Bayesian model of plan recognition.
Artificial Intelligence
,
64
(1)
,
53
79
.
13. 
Delton
,
A. W.
,
Krasnow
,
M. M.
,
Cosmides
,
L.
, &
Tooby
,
J.
(
2011
).
Evolution of direct reciprocity under uncertainty can explain human generosity in one-shot encounters.
Proceedings of the National Academy of Sciences of the U.S.A.
,
108
(32)
,
13335
13340
.
14. 
Dunbar
,
R.
(
2012
).
Science of love and betrayal.
London
:
Faber & Faber
.
15. 
Fagot
,
J.
, &
Cook
,
R. G.
(
2006
).
Evidence for large long-term memory capacities in baboons and pigeons and its implications for learning and the evolution of cognition.
Proceedings of the National Academy of Sciences of the U.S.A.
,
103
(46)
,
17564
17567
.
16. 
Falk
,
A.
,
Fehr
,
E.
, &
Fischbacher
,
U.
(
2008
).
Testing theories of fairness—Intentions matter.
Games and Economic Behavior
,
62
(1)
,
287
303
.
17. 
Frank
,
R. H.
,
Gilovich
,
T.
, &
Regan
,
D. T.
(
1993
).
The evolution of one-shot cooperation: An experiment.
Ethology and Sociobiology
,
14
(4)
,
247
256
.
18. 
Fudenberg
,
D.
, &
Imhof
,
L. A.
(
2005
).
Imitation processes with small mutations.
Journal of Economic Theory
,
131
,
251
262
.
19. 
Fudenberg
,
D.
, &
Levine
,
D. K.
(
1998
).
The theory of learning in games.
Cambridge, MA
:
MIT Press
.
20. 
Geib
,
C. W.
, &
Goldman
,
R. P.
(
2009
).
A probabilistic plan recognition algorithm based on plan tree grammars.
Artificial Intelligence
,
173
(2009)
,
1101
1132
.
21. 
Han
,
T. A.
, &
Pereira
,
L. M.
(
2011
).
Context-dependent incremental intention recognition through Bayesian network model construction.
In A. Nicholson (Ed.)
,
Proceedings of the Eighth UAI Bayesian Modeling Applications Workshop (UAI-AW 2011)
(pp.
50
58
).
22. 
Han
,
T. A.
, &
Pereira
,
L. M.
(
2011
).
Intention-based decision making with evolution prospection.
In L. Antunes & H. S. Pinto (Eds.)
,
Proceedings of 15th Portuguese Conference on Artificial Intelligence (EPIA'2011)
(pp.
254
267
).
23. 
Hardin
,
G.
(
1968
).
The tragedy of the commons.
Science
,
162
,
1243
1248
.
24. 
Hauert
,
C.
, &
Schuster
,
H.
(
1997
).
Effects of increasing the number of players and memory size in the iterated prisoner's dilemma: A numerical approach.
Proceedings of the Royal Society B: Biological Sciences
,
264
(1381)
,
513
519
.
25. 
Hauert
,
C.
,
Traulsen
,
A.
,
Brandt
,
H.
,
Nowak
,
M. A.
, &
Sigmund
,
K.
(
2007
).
Via freedom to coercion: The emergence of costly punishment.
Science
,
316
,
1905
1907
.
26. 
Heinze
,
C.
(
2003
).
Modeling intention recognition for intelligent agent systems.
Ph.D. thesis, The University of Melbourne, Australia
.
27. 
Hofbauer
,
J.
, &
Sigmund
,
K.
(
1998
).
Evolutionary games and population dynamics.
Cambridge, UK
:
Cambridge University Press
.
28. 
Iacoboni
,
M.
,
Molnar-Szakacs
,
I.
,
Gallese
,
V.
,
Buccino
,
G.
,
Mazziotta
,
J. C.
, &
Rizzolatti
,
G.
(
2005
).
PLoS Biology
,
3
(3)
,
e79
.
29. 
Imhof
,
L. A.
,
Fudenberg
,
D.
, &
Nowak
,
M. A.
(
2005
).
Evolutionary cycles of cooperation and defection.
Proceedings of the National Academy of Sciences of the U.S.A.
,
102
,
10797
10800
.
30. 
Janssen
,
M.
(
2008
).
Evolution of cooperation in a one-shot prisoner's dilemma based on recognition of trustworthy and untrustworthy agents.
Journal of Economic Behavior and Organization
,
65
(3–4)
,
458
471
.
31. 
Johnson-Laird
,
P. N.
(
2010
).
Mental models and human reasoning.
Proceedings of the National Academy of Sciences of the U.S.A.
,
107
(43)
,
18243
18250
.
32. 
Kareev
,
Y.
(
1995
).
Through a narrow window: Working memory capacity and the detection of covariation.
Cognition
,
56
(3)
,
263
269
.
33. 
Karlin
,
S.
, &
Taylor
,
H. E.
(
1975
).
A first course in stochastic processes.
New York
:
Academic Press
.
34. 
Kautz
,
H.
, &
Allen
,
J. F.
(
1986
).
Generalized plan recognition.
In
Proceedings of the Conference of the American Association of Artificial Intelligence (AAAI-86)
(pp.
32
38
).
35. 
Kim
,
K.-J.
, &
Lipson
,
H.
(
2009
).
Towards a “theory of mind” in simulated robots.
In
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late breaking papers, GECCO '09
(pp.
2071
2076
).
36. 
Kraus
,
S.
(
1997
).
Negotiation and cooperation in multi-agent environments.
Artificial Intelligence
,
94
(1–2)
,
79
98
.
37. 
Macy
,
M. W.
, &
Flache
,
A.
(
2002
).
Learning dynamics in social dilemmas.
Proceedings of the National Academy of Sciences of the U.S.A.
,
99
,
7229
7236
.
38. 
McNally
,
L.
,
Brown
,
S. P.
, &
Jackson
,
A. L.
(
2012
).
Cooperation and the evolution of intelligence.
Proceedings of the Royal Society B: Biological Sciences
,
279
(1740)
,
3027
3034
.
39. 
Meltzoff
,
A. N.
(
2005
).
Imitation and other minds: The “like me” hypothesi.
In
Perspectives on imitation: From neuroscience to social science. Imitation, human development, and culture
(pp.
55
77
).
Cambridge, MA
:
MIT Press
.
40. 
Meltzoff
,
A. N.
(
2007
).
The framework for recognizing and becoming an intentional agent.
Acta Psychologica (Amsterdam)
,
124
(1)
,
26
43
.
41. 
Milinski
,
M.
,
Semmann
,
D.
,
Krambeck
,
H.
, &
Marotzke
,
J.
(
2006
).
Stabilizing the earth's climate is not a losing game: Supporting evidence from public goods experiments.
Proceedings of the National Academy of Sciences of the U.S.A.
,
103
(11)
,
3994
3998
.
42. 
Nakahara
,
K.
, &
Miyashita
,
Y.
(
2005
).
Understanding intentions: Through the looking glass.
Science
,
308
(5722)
,
644
645
.
43. 
Nowak
,
M. A.
(
2006
).
Five rules for the evolution of cooperation.
Science
,
314
(5805)
,
1560
.
44. 
Nowak
,
M. A.
,
Sasaki
,
A.
,
Taylor
,
C.
, &
Fudenberg
,
D.
(
2004
).
Emergence of cooperation and evolutionary stability in finite populations.
Nature
,
428
,
646
650
.
45. 
Nowak
,
M. A.
, &
Sigmund
,
K.
(
1992
).
Tit for tat in heterogeneous populations.
Nature
,
355
,
250
253
.
46. 
Nowak
,
M. A.
, &
Sigmund
,
K.
(
1993
).
A strategy of win-stay, lose-shift that outperforms tit for tat in prisoner's dilemma.
Nature
,
364
,
56
58
.
47. 
Nowak
,
M. A.
, &
Sigmund
,
K.
(
2005
).
Evolution of indirect reciprocity.
Nature
,
437
,
1291
1298
.
48. 
Pacheco
,
J. M.
,
Santos
,
F. C.
, &
Chalub
,
F. A. C. C.
(
2006
).
Stern-judging: A simple, successful norm which promotes cooperation under indirect reciprocity.
PLoS Computational Biology
,
2
(12)
,
e178
.
49. 
Pearl
,
J.
(
1988
).
Probabilistic reasoning in intelligent systems: Networks of plausible inference.
San Mateo, CA
:
Morgan Kaufmann
.
50. 
Pereira
,
L. M.
, &
Han
,
T. A.
(
2009
).
Intention recognition via causal Bayes networks plus plan generation.
In
Progress in Artificial Intelligence, Proceedings of 14th Portuguese International Conference on Artificial Intelligence (EPIA'09)
(pp.
138
149
.
51. 
Pereira
,
L. M.
, &
Han
,
T. A.
(
2011
).
Intention recognition with evolution prospection and causal Bayesian networks.
In A. Madureira, J. Ferreira, & Z. Vale (Eds.)
,
Computational intelligence for engineering systems: Emergent applications
(pp.
1
33
).
Berlin
:
Springer
.
52. 
Premack
,
D.
, &
Woodruff
,
G.
(
1978
).
Does the chimpanzee have a theory of mind?
Behavioral and Brain Sciences
,
1
(4)
,
515
526
.
53. 
Radke
,
S.
,
Guroglu
,
B.
, &
de Bruijn
,
E. R. A.
(
2012
).
There's something about a fair split: Intentionality moderates context-based fairness considerations in social decision-making.
PLoS ONE
,
7
(2)
,
e31491
.
54. 
Richerson
,
P. J.
, &
Boyd
,
R.
(
2006
).
Not by genes alone: How culture transforms human evolution.
Chicago
:
The University of Chicago Press
.
55. 
Rizzolatti
,
G.
, &
Craighero
,
L.
(
2004
).
The mirror-neuron system.
Annual Review of Neuroscience
,
27
,
169
192
.
56. 
Robson
,
A.
(
1990
).
Efficiency in evolutionary games: Darwin, Nash, and the secret handshake.
Journal of Theoretical Biology
,
144
(3)
,
379
396
.
57. 
Roy
,
O.
(
2009
).
Intentions and interactive transformations of decision problems.
Synthese
,
169
(2)
,
335
349
.
58. 
Sandholm
,
T. W.
, &
Crites
,
R. H.
(
1995
).
Multiagent reinforcement learning in the iterated prisoner's dilemma.
Biosystems
,
37
,
147
166
.
59. 
Santos
,
F. C.
, &
Pacheco
,
J. M.
(
2005
).
Scale-free networks provide a unifying framework for the emergence of cooperation.
Physical Review Letters
,
95
,
098104
.
60. 
Santos
,
F. C.
, &
Pacheco
,
J. M.
(
2011
).
Risk of collective failure provides an escape from the tragedy of the commons.
Proceedings of the National Academy of Sciences of the U.S.A.
,
108
(26)
,
10421
10425
.
61. 
Santos
,
F. C.
,
Pacheco
,
J. M.
, &
Lenaerts
,
T.
(
2006
).
Evolutionary dynamics of social dilemmas in structured heterogeneous populations.
Proceedings of the National Academy of Sciences of the U.S.A.
,
103
,
3490
3494
.
62. 
Santos
,
F. C.
,
Pacheco
,
J. M.
, &
Skyrms
,
B.
(
2011
).
Co-evolution of pre-play signaling and cooperation.
Journal of Theoretical Biology
,
274
(1)
,
30
35
.
63. 
Santos
,
F. C.
,
Santos
,
M. D.
, &
Pacheco
,
J. M.
(
2008
).
Social diversity promotes the emergence of cooperation in public goods games.
Nature
,
454
,
214
216
.
65. 
Sayama
,
H.
,
Farrell
,
D. L.
, &
Dionne
,
S. D.
(
2011
).
The effects of mental model formation on group decision making: An agent-based simulation.
Complexity
,
16
(3)
,
49
57
.
65. 
Selten
,
R.
(
2001
).
What is bounded rationality?
In
Bounded rationality: The adaptive toolbox
(pp.
13
36
).
Cambridge, MA
:
MIT Press
.
66. 
Shennan
,
S.
(
2002
).
Genes, memes and human history: Darwinian archaeology and cultural evolution.
London
:
Thames and Hudson
.
67. 
Sigmund
,
K.
(
2010
).
The calculus of selfishness.
Princeton, NJ
:
Princeton University Press
.
68. 
Simon
,
H. A.
(
1957
).
A behavioral model of rational choice.
In
Models of man, social and rational: Mathematical essays on rational human behavior in a social setting
,
New York
:
Wiley
.
69. 
Simon
,
H. A.
(
1990
).
Invariants of human behavior.
Annual Review of Psychology
,
41
,
1
19
.
70. 
Skyrms
,
B.
(
2004
).
The stag hunt and the evolution of social structure.
Cambridge, UK
:
Cambridge University Press
.
71. 
Skyrms
,
B.
(
2010
).
Signals: Evolution, learning, and information.
Oxford, UK
:
Oxford University Press
.
72. 
Szabo´
,
G.
, &
Fa´th
,
G.
(
2007
).
Evolutionary games on graphs.
Physics Reports
,
446
(4–6)
,
97
216
.
73. 
Szabo
,
G.
, &
Toke
,
C.
(
1998
).
Evolutionary prisoner's dilemma game on a square lattice.
Physical Review E
,
58
,
69
73
.
74. 
Takano
,
M.
, &
Arita
,
T.
(
2006
).
Asymmetry between even and odd levels of recursion in a theory of mind.
In
Proceedings of ALife X
(pp.
405
411
).
75. 
Todd
,
P. M.
(
2001
).
Fast and frugal heuristics for environmentally bounded minds.
In
Bounded rationality: The adaptive toolbox
(pp.
51
70
).
Cambridge, MA
:
MIT Press
.
76. 
Tomasello
,
M.
(
2008
).
Origins of human communication.
Cambridge, MA
:
MIT Press
.
77. 
Traulsen
,
A.
, &
Nowak
,
M. A.
(
2006
).
Evolution of cooperation by multilevel selection.
Proceedings of the National Academy of Sciences of the U.S.A.
,
103
(29)
,
10952
.
78. 
Traulsen
,
A.
,
Nowak
,
M. A.
, &
Pacheco
,
J. M.
(
2006
).
Stochastic dynamics of invasion and fixation.
Physical Review E
,
74
,
11909
.
79. 
Traulsen
,
A.
,
Pacheco
,
J. M.
, &
Nowak
,
M. A.
(
2007
).
Pairwise comparison and selection temperature in evolutionary game dynamics.
Journal of Theoretical Biology
,
246
,
522
529
.
80. 
Traulsen
,
A.
,
Semmann
,
D.
,
Sommerfel
,
R. D.
,
Krambeck
,
H.
, &
Milinski
,
M.
(
2010
).
Human strategy updating in evolutionary games.
Proceedings of the National Academy of Sciences of the U.S.A.
,
107
(7)
,
2962
2966
.
81. 
Trivers
,
R. L.
(
1971
).
The evolution of reciprocal altruism.
Quarterly Review of Biology
,
46
,
35
57
.
82. 
van Hees
,
M.
, &
Roy
,
O.
(
2008
).
Intentions and plans in decision and game theory.
In
Reasons and intentions
(pp.
207
226
).
Farnham, UK
:
Ashgate Publishers
.
83. 
Van Segbroeck
,
S.
,
de Jong
,
S.
,
Nowe
,
A.
,
Santos
,
F. C.
, &
Lenaerts
,
T.
(
2010
).
Learning to coordinate in complex networks.
Adaptive Behavior
,
18
,
416
427
.
84. 
Van Segbroeck
,
S.
,
Pacheco
,
J. M.
,
Lenaerts
,
T.
, &
Santos
,
F. C.
(
2012
).
Emergence of fairness in repeated group interactions.
Physical Review Letters
,
108
,
158104
.
85. 
West
,
S. A.
,
Griffin
,
A. A.
, &
Gardner
,
A.
(
2007
).
Evolutionary explanations for cooperation.
Current Biology
,
17
,
R661
R672
.
86. 
Whiten
,
A.
(
1991
).
Natural theories of mind: Evolution, development, and simulation of everyday mindreading.
Oxford, UK
:
B. Blackwell
.
87. 
Woodward
,
A. L.
,
Sommerville
,
J. A.
,
Gerson
,
S.
,
Henderson
,
A. M. E.
, &
Buresh
,
J.
(
2009
).
The emergence of intention attribution in infancy.
In B. H. Ross (Ed.)
,
The psychology of learning and motivation
(pp.
187
222
).
New York
:
Academic Press
.
88. 
Wu
,
B.
,
Altrock
,
P. M.
,
Wang
,
L.
, &
Traulsen
,
A.
(
2010
).
Universality of weak selection.
Physical Review E
,
82
(4)
,
046106
.
89. 
Zanlungo
,
F.
(
2007
).
Microscopic dynamics of artificial life systems.
PhD thesis, University of Bologna
.

Author notes

Contact author.

∗∗

Centro de Inteligência Artificial (CENTRIA), Departamento de Informática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal. E-mail: h.anh@campus.fct.unl.pt (T.A.H.); lmp@fct.unl.pt (L.M.P.); franciscocsantos@ist.utl.pt (F.C.S.)

DEI & INESC-ID, Instituto Superior Técnico, TU Lisbon, Taguspark, 2744-016 Porto Salvo, Portugal.

ATP Group, CMAF, Instituto para a Investigação Interdisciplinar, P-1649-003 Lisboa Codex, Portugal.