Friendly-rivalry solutions for direct reciprocity

Direct reciprocity is one of the key mechanisms accounting for cooperation in our social life. According to recent under-standing, most of the classical strategies for direct reciprocity fall into one of two classes, ‘partners’ or ‘rivals.’ A ‘partner’ is a generous strategy achieving mutual cooperation, and a ‘rival’ never lets the co-player become better off. They have different working conditions: For example, partners show good performance in a large population, whereas rivals do in head-to-head matches. Using exhaustive enumeration us- ing a super-computer, we demonstrate the existence of strategies that are partners as well as rivals, called ‘friendly ri- vals.’ Among them, we focus on a human-interpretable strategy, named ‘CAPRI’ after its ﬁve characteristic ingredients, i.e., cooperate, accept, punish, recover, and defect otherwise. Our evolutionary simulation shows excellent performance of CAPRI regardless of environmental conditions.


Introduction
Theory of repeated games is one of the most fundamental mathematical frameworks that has long been studied for understanding how and why cooperation emerges in human and biological communities. A spectacular example is the iterated prisoner's dilemma (PD) game: It describes a social dilemma between two players, say, Alice and Bob, in which each player has two options 'cooperation' (c) and 'defection' (d). The payoff matrix is defined as where each entry shows (Alice's payoff, Bob's payoff) with T > R > P > S and 2R > T + S. In a simplified version, the PD game is parametrized as T = b, R = b − c, P = 0, and S = −c, where b and c are the benefit and cost of cooperation, respectively, with b > c > 0. If the PD game is repeated with sufficiently high probability, cooperation becomes a feasible solution because a player can reward and/or punish the co-player in subsequent rounds. This is known as direct reciprocity, one of the most well-known mechanisms for the evolution of cooperation.
According to recent understanding, most of well-known strategies for direct reciprocity fall into one of two classes, 'partners' or 'rivals' (Hilbe et al., 2018). Derived from our everyday language, the 'partner' and 'rival' are defined as follows. When the two players, say Alice and Bob, use a partner strategy, π A = π B = R is realized even with infinitesimal error, where π A and π B represent the long-term average payoffs of the players. However, when one of the players, Bob, defects from cooperation, Alice will punish Bob so that his payoff becomes less than R. It means that one of the best responses against a partner strategy is choosing the same partner strategy so that they form a Nash equilibrium. If a player uses a rival strategy, on the other hand, the player aims at a payoff higher than or equal to the coplayer's regardless of the co-player's strategy. Thus, as long as Alice is a rival, it is guaranteed that π A ≥ π B . Note that these two definitions impose no restriction on Bob's strategy, which means that the inequalities are unaffected even if Bob remembers arbitrarily many previous rounds. A schematic diagram of the strategy space is shown in Fig. 1 Figure 1: A schematic diagram of the strategy space. Strategies that tend to cooperate (defect) are shown on the left (right). The blue (red) ellipse represents a set of partner (rival) strategies. We found that the intersection, called 'friendly rivals', indeed exists and shows excellent performance in evolutionary games.
Which of these two traits is favored by selection depends on environmental conditions, such as the population size N and the elementary payoffs R, T , S, and P . For instance, a large population tends to adopt partner strategies when R is high enough. A natural question would be on the possibility that a single strategy is both a partner and a rival simultaneously. Let us call such a strategy a 'friendly rival' hereafter. If such a strategy exists, mutual cooperation is realized with assuring that the player will never be beaten by any kind of opponents. However, the construction of a friendly rival is not easy since the requirements for being a partner and for being a rival are seemingly contradictory: Knowing that error occurs with probability e 1, Alice must forgive Bob's erroneous defection to be a partner and punish his malicious defection to be a rival, without knowing Bob's intention. This is the crux of the matter in relationships.

Main results
In this work (Murase and Baek, 2020), we exhaustively searched the memory-3 strategy space, which amounts to 2 64 ≈ 1.84 × 10 19 strategies in total, by massive supercomputing. As a result, we found about 7 trillions friendly rivals. Although its fraction to the entire strategy space is tiny, the space indeed contains diverse friendly-rival strategies. Among these strategies, we discovered a strategy which is fairly easy to interpret, named 'CAPRI', after the first letters of its five constitutive rules listed below: 1. Cooperate at mutual cooperation. 2. Accept punishment when you mistakenly defected from mutual cooperation. 3. Punish your co-player by defecting once when he defected from mutual cooperation. 4. Recover cooperation when you or your co-player cooperated at mutual defection. 5. In all the other cases, defect.
The first rule is clearly needed to be a partner. In addition, mutual cooperation must be robust against one-bit error, i.e., occurring with probability of O(e), when both Alice and Bob use this strategy. This property is provided by the second and the third rules. In addition, for this strategy to be a partner, the players must be able to escape from mutual defection through one-bit error so that the stationary probability distribution does not accumulate at mutual defection, which is handled by the fourth rule. Note that these four rules for being a partner do not necessarily violate the condition to be a rival when the memory length is sufficiently long. CAPRI is actually a rival because of the fifth rule.
Although a rival strategy is never outperformed by the opponent, it does not necessarily guarantee success in evolutionary games, where everyone is pitted against every other in the population. For example, it is well-known that extortionate ZD strategies, which are rivals, perform poorly in an evolutionary game. In general, evolutionary robustness (Stewart and Plotkin, 2013) of a strategy depends on the environmental conditions: Partner strategies have this property when N is large enough, whereas for rival strategies, it is when N is small. Friendly rivals have the virtue of both: They keep evolutionary robustness regardless of N , as will be shown in the following.
We evaluated the performance of CAPRI in evolutionary games using a standard stochastic model, where selection takes place through an imitation process in a well-mixed population of size N . (See Murase and Baek (2020) for details.) As shown in Fig. 2, CAPRI overwhelms other strategies under a variety of environmental conditions. We analytically showed that a friendly-rival strategy is evolutionarily robust regardless of N or the benefit-to-cost ratio of cooperation, which explains why CAPRI shows excellent performance in evolutionary settings. Nevertheless, we still have open questions such as whether CAPRI can emerge spontaneously among many other friendly rivals and how the evolutionary path would look like. Lastly, we list other advantages of CAPRI as follows:

CAPRI Partner Rival Other
1. CAPRI earns strictly higher payoffs against a wide range of non-rival co-players, such as AllC and generous TFT. 2. The strategy is deterministic: A player can implement the strategy without any randomization device. 3. Even if uncertainty exists in b and c, a friendly rival retains its power because its principles are independent of (R, T, S, P ). This is a distinct feature compared with the ZD strategies, whose cooperation probabilities have to be calculated from the elementary payoffs. 4. The results can be generalized beyond the PD game: CAPRI works for other social dilemmas, including the snowdrift game and the stag-hunt game. 5. CAPRI can be extended to n-person social dilemmas (Murase and Baek, 2021).
Considering these strengths, we propose that CAPRI can provide an extremely useful set of guidelines for direct reciprocity.