Misperception influence on zero-determinant strategies in iterated Prisoner’s Dilemma
ModelsConsider an IPD game with misperception such as implementation errors and observation errors22,23,31. Due to the misperception, the parameter in the real game changes from (omega _1=[T_1,R_1,P_1,S_1]) to (omega _2=[T_2,R_2,P_2,S_2]), and only player X notices the change. Thus, player Y’s cognition of the parameter is (omega _1), while player X’s cognition of the parameter is (omega _2). In each round, player X chooses a strategy from its strategy set (Omega _X={{mathbf {p}}=[p_{cc},p_{cd},p_{dc},p_{dd}]^T|p_{xy} in [0,1],xyin {cc,cd,dc,dd}}), e.g., (p_{xy}) is player X’s probability for cooperating with given previous outcome (xyin {cc,cd,dc,dd}). Similar to (Omega _X), player Y’s strategy set is (Omega _Y={{mathbf {q}}=[q_{cc},q_{dc},q_{cd},q_{dd}]^T|q_{xy} in [0,1],xyin {cc,dc,cd,dd}}). According to Press and Dyson7, this game can be characterized by a Markov chain with a state transition matrix (M=[M_{jk}]_{4times 4}) (see “Notations” for details). Denote ({mathbf {v}}=[v_{cc},v_{cd},v_{dc},v_{dd}]^T) as a probability vector such that ({mathbf {v}}^T M={mathbf {v}}^T) and (v_{cc}+v_{cd}+v_{dc}+v_{dd}=1). Let ({mathbf {S}}^{omega _i}_{X}=[R_i,S_i,T_i,P_i]^T), and ({mathbf {S}}^{omega _i}_{Y}=[R_i,T_i,S_i,P_i]^T,) (iin {1,2}). The expected utility functions of players are as follows:$$begin{aligned} begin{aligned} u_X^{omega _i}({mathbf {p}},{mathbf {q}})={mathbf {v}} cdot {mathbf {S}}^{omega _i}_{X}, u_Y^{omega _i}({mathbf {p}},{mathbf {q}})={mathbf {v}} cdot {mathbf {S}}^{omega _i}_{Y},iin {1,2}. end{aligned} end{aligned}$$Denote (G_1 = {{mathbf {P}}, {varvec{Omega }}, {mathbf {u}}, omega _1}), and (G_2={{mathbf {P}},{varvec{Omega }},{mathbf {u}},omega _2}), where ({mathbf {P}}={X,Y}), ({varvec{Omega }}=Omega _Xtimes Omega _Y), and ({mathbf {u}}={u_X^{omega _i},u_Y^{omega _i}}, iin {1,2}). Thus, the actual utilities of players are obtained through (G_2), and in the view of player Y, they are playing game (G_1). In the view of player X, they are playing game (G_2) but player X knows that player Y’s cognition is (G_1). (G_1) and (G_2) are shown in Table 2.Table 2 Utility matrices in IPD games with misperception.Full size tableLet ({mathbf {p}}_0=[1,1,0,0]^T). For (iin {1,2}), ({mathbf {p}}=alpha {mathbf {S}}^{omega _i}_{X} +beta {mathbf {S}}^{omega _i}_Y +gamma {mathbf {1}}+{mathbf {p}}_0), where (alpha ,beta ,gamma in {mathbb {R}}), is called a ZD strategy7 of player X in (G_i) since the strategy makes the two players’ expected utilities subjected to a linear relation:$$begin{aligned} alpha u_X^{omega _i}({mathbf {p}},{mathbf {q}})+beta u_Y^{omega _i}({mathbf {p}},{mathbf {q}})+gamma =0, end{aligned}$$for any player Y’s strategy ({mathbf {q}}). All available ZD strategies for player X in G can be expressed as (Xi (omega _i)={{mathbf {p}}in Omega _X|{mathbf {p}}=alpha {mathbf {S}}^{omega _i}_{X} +beta {mathbf {S}}^{omega _i}_Y +gamma {mathbf {1}}+{mathbf {p}}_0,alpha ,beta ,gamma in {mathbb {R}} }.) Also, the three special ZD strategies are denoted as:
(1)
equalizer strategy7,12: ({mathbf {p}}=beta {mathbf {S}}^{omega _i}_{Y}+gamma {mathbf {1}}+{mathbf {p}}_0);
(2)
extortion strategy7,13: ({mathbf {p}}=phi [({mathbf {S}}^{omega _i}_X-P_i{mathbf {1}})-chi ({mathbf {S}}^{omega _i}_Y-P_i{mathbf {1}})]+{mathbf {p}}_0,chi geqslant 1);
(3)
generous strategy14,15: ({mathbf {p}}=phi [({mathbf {S}}^{omega _i}_X-R_i{mathbf {1}})-chi ({mathbf {S}}^{omega _i}_Y-R_i{mathbf {1}})] +{mathbf {p}}_0,chi geqslant 1).
Based on the past experience, player Y knows that player X prefers ZD strategies, which has been widely considered in many IPD games7,9. To avoid that player Y notices the change, which may result in potential decrease of player X’s utility21 or collapse of the model28, player X keeps choosing ZD strategies according to (G_1), such that the strategy sequence matches player Y’s anticipation. To sum up, in our formulation,
the real game is (G_2);
player Y thinks that they are playing game (G_1), and player X thinks that they are playing game (G_2);
player X knows that player Y’s cognition is (G_1);
player Y believes that player X chooses ZD strategies;
player X tends to choose a ZD strategy according to (G_1) to avoid player Y’s suspicion of misperception.
In fact, player X can benefit from the misperception through the ZD strategy. For example, player X can adopt a generous strategy in (G_1) to not only promote player Y’s cooperation behavior, but also make player X’s utility higher than that of player Y, if the generous strategy is an extortion strategy in (G_2). A beneficial strategy for player X is able to maintain a linear relationship between players’ utilities or improve the supremum or the infimum of its utility in its own cognition. In the following, we aim to analyze player X’s implementation of a ZD strategy in IPD with misperception, and proofs are given in the Supplementary Information.Invariance of ZD strategyPlayer X’s ZD strategies may be kept in IPD games with misperception from implementation errors or observation errors. In particular, player X keeps choosing a ZD strategy ({mathbf {p}}) in (G_1) to avoid player Y’s suspicion about possible misperception. In the view of player X, it can also enforce players’ expected utilities subjected to a linear relationship if ({mathbf {p}}) is also a ZD strategy in (G_2). The following theorem provides a necessary and sufficient condition for the invariance of the linear relationship between players’ utilities.Theorem 1
Any ZD strategy ({mathbf {p}}) of player X in (G_1) is also a ZD strategy in (G_2) if and only if$$begin{aligned} frac{R_1-P_1}{2R_1-S_1-T_1}=frac{R_2-P_2}{2R_2-S_2-T_2}. end{aligned}$$
(1)
If (1) holds, player X can ignore the misperception and choose an arbitrary ZD strategy based on its opponent’s anticipation since it also leads to a linear relationship between players’ utilities, as shown in Fig. 1; otherwise, player X can not unscrupulously choose ZD strategies based on player Y’s cognition. There is a player X’s ZD strategy in player Y’s cognition which is not the ZD strategy in player X’s cognition. Further, because of the symmetry of (omega _1) and (omega _2), player X’s any available ZD strategy ({mathbf {p}}) in (G_2) is also a ZD strategy in (G_1) if and only if (1) holds. It indicates that (Xi (omega _1)=Xi (omega _2)) and player X can choose any ZD strategy based on its own cognition, which does not cause suspicion of the opponent since it is also consistent with player Y’s anticipation. Additionally, the slopes of linear relations between players’ utilities may be different, as also shown in Fig. 1, and player X can benefit from the misperception by choosing a ZD strategy to improve the corresponding slope.In fact, (1) covers the following two cases:
(1)
(2P_i=T_i+S_i), (iin {1,2}), is a sufficient condition of (1). Thus, when (2P_i=T_i+S_i), (iin {1,2}), player X’s any ZD strategy ({mathbf {p}}) in (G_1) is also a ZD strategy in (G_2). Actually, (2P_i=T_i+S_i), (iin {1,2}), means that the sum of players’ utilities when players mutual defect is equal to that when only one player chooses defective strategies.
(2)
(R_i+P_i=T_i+S_i), (iin {1,2}), is another sufficient condition of (1). Thus, when (R_i+P_i=T_i+S_i), (iin {1,2}), player X’s any ZD strategy ({mathbf {p}}) in (G_1) is also a ZD strategy in (G_2). Actually, (R_i+P_i=T_i+S_i), (iin {1,2}), means that the game has a balanced structure in utilities32. At this point, the relationship between cooperation rate and efficiency is monotonous, i.e., the higher the cooperation rate of both sides, the greater the efficiency (the sum of players’ utilities).
Furthermore, for the three special ZD strategies, player X can also maintain a linear relationship between players’ utilities in the IPD game with misperception.Figure 1Player X can also enforce a linear relationship between players’ utilities in its own cognition. Let (omega _1=[T,R_1,P_1,S]=[5,3,1,0]) and (omega _2=[T,R_2,P_2,S]=[5,frac{23}{7},frac{1}{7},0]), which satisfy (1). Consider that player X chooses two different ZD strategies in (a) and (b), respectively, and the red lines describe the relationships between players’ utilities in (G_1). We randomly generate 100 player Y’s strategies, and blue circles are ((u^{omega _2}_X,u^{omega _2}_Y)), correspondingly. Notice that blue circles are indeed on a cyan line in both (a) and (b).Full size imageEqualizer strategyBy choosing equalizer strategies according to player Y’s cognition, player X can unilaterally set player Y’s utilities, as shown in the following corollary.
Corollary 1
Player X’s any equalizer strategy ({mathbf {p}}) in (G_1) is also an equalizer strategy in (G_2) if and only if$$begin{aligned} frac{R_1-P_1}{R_2-P_2}=frac{R_1-T_1}{R_2-T_2}=frac{R_1-S_1}{R_2-S_2}. end{aligned}$$
(2)
(2) is also a sufficient condition of (1). If (2) holds, player X can unilaterally set player Y’s utility by choosing any equalizer strategy in (G_1) even though they have different cognitions; otherwise, player X can not unscrupulously choose an equalizer strategy based on player Y’s cognition since it may not be an equalizer strategy in player X’s cognition.Extortion strategyBy choosing extortion strategies according to player Y’s cognition, player X can get an extortionate share, as shown in the following corollary.
Corollary 2
For player X’s extortion strategy ({mathbf {p}}) with extortion factor (chi >1) in (G_1), ({mathbf {p}}) is also an extortion strategy in (G_2) if (1) and the following inequality hold:$$begin{aligned} begin{aligned} (S_1-P_1)(R_2-P_2)-(R_1-P_1)(T_2-P_2)-chi ((T_1-P_1)(R_2-P_2)-(R_1-P_1)(T_2-P_2))1) in (G_1), ({mathbf {p}}) is also a generous strategy in (G_2) if (1) and the following inequality hold:$$begin{aligned} begin{aligned}(S_1-R_1)(R_2-P_2)-(R_1-P_1)(T_2-R_2)-chi ((T_1-R_1)(R_2-P_2)-(R_1-P_1)(T_2-R_2))b^1_i, iin {1,2}, end{aligned} end{aligned}$$
(5)
where (a^1_i) and (b^1_i,iin {1,2}) are parameters shown in “Notations”.
Actually, when player Y chooses the always cooperate (ALLC) strategy35, i.e., ({mathbf {q}}=[1,1,1,1]^T), player X gets the supremum of the expected utility in (G_1) and player X’s utility is improved in the IPD game with misperception.Figure 4Player X can use either equalizer strategies and extortion strategies to raise the supremum of its expected utility or generous strategies to raise the infimum of its expected utility. (a) and (b) consider that (omega _1=[T,R_1,P,S]) and (omega _2=[T,R_2,P,S]), where (R_1ne R_2); (c) considers that (omega _1=[T,R,P_1,S]) and (omega _2=[T,R,P_2,S]), where (P_1ne P_2). The red lines in (a), (b), and (c) describe utilities’ relationships when player X chooses an equalizer strategy, an extortion strategy, and a generous strategy in (G_1), respectively; The yellow area contains all possible relationships between players’ utilities in (G_2) if player X does not change its strategy. In (a) and (b), r is the supremum of player X’s utility in (G_1), and (r’) is lower than the supremum of player X’s utility in (G_2); In (c), l is the infimum of player X’s utility in (G_1), and (l’) is lower than the infimum of player X’s utility in (G_2).Full size imageExtortion strategyBy choosing extortion strategies according to player Y’s cognition, player X can also improve the supremum of its expected utility.
Corollary 5
For player X’s extortion strategy ({mathbf {p}}) with extortion factor (chi >1) in (G_1), the supremum of player X’s expected utility in (G_2) is larger than that in (G_1) if$$begin{aligned} begin{aligned}a^2_ichi ^2+b^2_ichi +c^2_i1), the infimum of player X’s expected utility in (G_2) is larger than that in (G_1) if$$begin{aligned} begin{aligned}a^3_ichi ^2+b^3_ichi +c^3_i More