Misperception influence on zero-determinant strategies in iterated Prisoner’s Dilemma

Models

Consider an IPD game with misperception such as implementation errors and observation errors^22,23,31. Due to the misperception, the parameter in the real game changes from (omega _1=[T_1,R_1,P_1,S_1]) to (omega _2=[T_2,R_2,P_2,S_2]), and only player X notices the change. Thus, player Y’s cognition of the parameter is (omega _1), while player X’s cognition of the parameter is (omega _2). In each round, player X chooses a strategy from its strategy set (Omega _X={{mathbf {p}}=[p_{cc},p_{cd},p_{dc},p_{dd}]^T|p_{xy} in [0,1],xyin {cc,cd,dc,dd}}), e.g., (p_{xy}) is player X’s probability for cooperating with given previous outcome (xyin {cc,cd,dc,dd}). Similar to (Omega _X), player Y’s strategy set is (Omega _Y={{mathbf {q}}=[q_{cc},q_{dc},q_{cd},q_{dd}]^T|q_{xy} in [0,1],xyin {cc,dc,cd,dd}}). According to Press and Dyson⁷, this game can be characterized by a Markov chain with a state transition matrix (M=[M_{jk}]_{4times 4}) (see “Notations” for details). Denote ({mathbf {v}}=[v_{cc},v_{cd},v_{dc},v_{dd}]^T) as a probability vector such that ({mathbf {v}}^T M={mathbf {v}}^T) and (v_{cc}+v_{cd}+v_{dc}+v_{dd}=1). Let ({mathbf {S}}^{omega _i}_{X}=[R_i,S_i,T_i,P_i]^T), and ({mathbf {S}}^{omega _i}_{Y}=[R_i,T_i,S_i,P_i]^T,) (iin {1,2}). The expected utility functions of players are as follows:

$$begin{aligned} begin{aligned} u_X^{omega _i}({mathbf {p}},{mathbf {q}})={mathbf {v}} cdot {mathbf {S}}^{omega _i}_{X}, u_Y^{omega _i}({mathbf {p}},{mathbf {q}})={mathbf {v}} cdot {mathbf {S}}^{omega _i}_{Y},iin {1,2}. end{aligned} end{aligned}$$

Denote (G_1 = {{mathbf {P}}, {varvec{Omega }}, {mathbf {u}}, omega _1}), and (G_2={{mathbf {P}},{varvec{Omega }},{mathbf {u}},omega _2}), where ({mathbf {P}}={X,Y}), ({varvec{Omega }}=Omega _Xtimes Omega _Y), and ({mathbf {u}}={u_X^{omega _i},u_Y^{omega _i}}, iin {1,2}). Thus, the actual utilities of players are obtained through (G_2), and in the view of player Y, they are playing game (G_1). In the view of player X, they are playing game (G_2) but player X knows that player Y’s cognition is (G_1). (G_1) and (G_2) are shown in Table 2.

Table 2 Utility matrices in IPD games with misperception.

Full size table

Let ({mathbf {p}}_0=[1,1,0,0]^T). For (iin {1,2}), ({mathbf {p}}=alpha {mathbf {S}}^{omega _i}_{X} +beta {mathbf {S}}^{omega _i}_Y +gamma {mathbf {1}}+{mathbf {p}}_0), where (alpha ,beta ,gamma in {mathbb {R}}), is called a ZD strategy⁷ of player X in (G_i) since the strategy makes the two players’ expected utilities subjected to a linear relation:

$$begin{aligned} alpha u_X^{omega _i}({mathbf {p}},{mathbf {q}})+beta u_Y^{omega _i}({mathbf {p}},{mathbf {q}})+gamma =0, end{aligned}$$

for any player Y’s strategy ({mathbf {q}}). All available ZD strategies for player X in G can be expressed as (Xi (omega _i)={{mathbf {p}}in Omega _X|{mathbf {p}}=alpha {mathbf {S}}^{omega _i}_{X} +beta {mathbf {S}}^{omega _i}_Y +gamma {mathbf {1}}+{mathbf {p}}_0,alpha ,beta ,gamma in {mathbb {R}} }.) Also, the three special ZD strategies are denoted as:

(1)
equalizer strategy^7,12: ({mathbf {p}}=beta {mathbf {S}}^{omega _i}_{Y}+gamma {mathbf {1}}+{mathbf {p}}_0);
(2)
extortion strategy^7,13: ({mathbf {p}}=phi [({mathbf {S}}^{omega _i}_X-P_i{mathbf {1}})-chi ({mathbf {S}}^{omega _i}_Y-P_i{mathbf {1}})]+{mathbf {p}}_0,chi geqslant 1);
(3)
generous strategy^14,15: ({mathbf {p}}=phi [({mathbf {S}}^{omega _i}_X-R_i{mathbf {1}})-chi ({mathbf {S}}^{omega _i}_Y-R_i{mathbf {1}})] +{mathbf {p}}_0,chi geqslant 1).

Based on the past experience, player Y knows that player X prefers ZD strategies, which has been widely considered in many IPD games^7,9. To avoid that player Y notices the change, which may result in potential decrease of player X’s utility²¹ or collapse of the model²⁸, player X keeps choosing ZD strategies according to (G_1), such that the strategy sequence matches player Y’s anticipation. To sum up, in our formulation,

the real game is (G_2);
player Y thinks that they are playing game (G_1), and player X thinks that they are playing game (G_2);
player X knows that player Y’s cognition is (G_1);
player Y believes that player X chooses ZD strategies;
player X tends to choose a ZD strategy according to (G_1) to avoid player Y’s suspicion of misperception.

In fact, player X can benefit from the misperception through the ZD strategy. For example, player X can adopt a generous strategy in (G_1) to not only promote player Y’s cooperation behavior, but also make player X’s utility higher than that of player Y, if the generous strategy is an extortion strategy in (G_2). A beneficial strategy for player X is able to maintain a linear relationship between players’ utilities or improve the supremum or the infimum of its utility in its own cognition. In the following, we aim to analyze player X’s implementation of a ZD strategy in IPD with misperception, and proofs are given in the Supplementary Information.

Invariance of ZD strategy

Player X’s ZD strategies may be kept in IPD games with misperception from implementation errors or observation errors. In particular, player X keeps choosing a ZD strategy ({mathbf {p}}) in (G_1) to avoid player Y’s suspicion about possible misperception. In the view of player X, it can also enforce players’ expected utilities subjected to a linear relationship if ({mathbf {p}}) is also a ZD strategy in (G_2). The following theorem provides a necessary and sufficient condition for the invariance of the linear relationship between players’ utilities.

Theorem 1

Any ZD strategy ({mathbf {p}}) of player X in (G_1) is also a ZD strategy in (G_2) if and only if

$$begin{aligned} frac{R_1-P_1}{2R_1-S_1-T_1}=frac{R_2-P_2}{2R_2-S_2-T_2}. end{aligned}$$

(1)

If (1) holds, player X can ignore the misperception and choose an arbitrary ZD strategy based on its opponent’s anticipation since it also leads to a linear relationship between players’ utilities, as shown in Fig. 1; otherwise, player X can not unscrupulously choose ZD strategies based on player Y’s cognition. There is a player X’s ZD strategy in player Y’s cognition which is not the ZD strategy in player X’s cognition. Further, because of the symmetry of (omega _1) and (omega _2), player X’s any available ZD strategy ({mathbf {p}}) in (G_2) is also a ZD strategy in (G_1) if and only if (1) holds. It indicates that (Xi (omega _1)=Xi (omega _2)) and player X can choose any ZD strategy based on its own cognition, which does not cause suspicion of the opponent since it is also consistent with player Y’s anticipation. Additionally, the slopes of linear relations between players’ utilities may be different, as also shown in Fig. 1, and player X can benefit from the misperception by choosing a ZD strategy to improve the corresponding slope.

In fact, (1) covers the following two cases:

(1)
(2P_i=T_i+S_i), (iin {1,2}), is a sufficient condition of (1). Thus, when (2P_i=T_i+S_i), (iin {1,2}), player X’s any ZD strategy ({mathbf {p}}) in (G_1) is also a ZD strategy in (G_2). Actually, (2P_i=T_i+S_i), (iin {1,2}), means that the sum of players’ utilities when players mutual defect is equal to that when only one player chooses defective strategies.
(2)
(R_i+P_i=T_i+S_i), (iin {1,2}), is another sufficient condition of (1). Thus, when (R_i+P_i=T_i+S_i), (iin {1,2}), player X’s any ZD strategy ({mathbf {p}}) in (G_1) is also a ZD strategy in (G_2). Actually, (R_i+P_i=T_i+S_i), (iin {1,2}), means that the game has a balanced structure in utilities³². At this point, the relationship between cooperation rate and efficiency is monotonous, i.e., the higher the cooperation rate of both sides, the greater the efficiency (the sum of players’ utilities).

Furthermore, for the three special ZD strategies, player X can also maintain a linear relationship between players’ utilities in the IPD game with misperception.

Figure 1

Player X can also enforce a linear relationship between players’ utilities in its own cognition. Let (omega _1=[T,R_1,P_1,S]=[5,3,1,0]) and (omega _2=[T,R_2,P_2,S]=[5,frac{23}{7},frac{1}{7},0]), which satisfy (1). Consider that player X chooses two different ZD strategies in (a) and (b), respectively, and the red lines describe the relationships between players’ utilities in (G_1). We randomly generate 100 player Y’s strategies, and blue circles are ((u^{omega _2}_X,u^{omega _2}_Y)), correspondingly. Notice that blue circles are indeed on a cyan line in both (a) and (b).

Full size image

Equalizer strategy

By choosing equalizer strategies according to player Y’s cognition, player X can unilaterally set player Y’s utilities, as shown in the following corollary.

Corollary 1

Player X’s any equalizer strategy ({mathbf {p}}) in (G_1) is also an equalizer strategy in (G_2) if and only if

$$begin{aligned} frac{R_1-P_1}{R_2-P_2}=frac{R_1-T_1}{R_2-T_2}=frac{R_1-S_1}{R_2-S_2}. end{aligned}$$

(2)

(2) is also a sufficient condition of (1). If (2) holds, player X can unilaterally set player Y’s utility by choosing any equalizer strategy in (G_1) even though they have different cognitions; otherwise, player X can not unscrupulously choose an equalizer strategy based on player Y’s cognition since it may not be an equalizer strategy in player X’s cognition.

Extortion strategy

By choosing extortion strategies according to player Y’s cognition, player X can get an extortionate share, as shown in the following corollary.

Corollary 2

For player X’s extortion strategy ({mathbf {p}}) with extortion factor (chi >1) in (G_1), ({mathbf {p}}) is also an extortion strategy in (G_2) if (1) and the following inequality hold:

$$begin{aligned} begin{aligned} (S_1-P_1)(R_2-P_2)-(R_1-P_1)(T_2-P_2)-chi ((T_1-P_1)(R_2-P_2)-(R_1-P_1)(T_2-P_2))<0. end{aligned} end{aligned}$$

(3)

Player X’s extortion strategy in (G_1), whose extortion factor (chi) satisfies (3), can also ensure that player X’s utility is not lower than the opponent’s utility in its own cognition. Thus, player X chooses a strategy that satisfies (3), and can also enforce an extortionate share even if there exists misperception.

Figure 2

The form of (theta) in the IPD game with misperception. Consider (omega _1=[T_1,R_1,P_1,S_1]=[5,3,1,0]) and (omega _2=[T_2,R_2,P_2,S_2]=[frac{13}{2},6,1,0]). Suppose (p_{dd}=0) since it does not influence nonzero canonical angles. The purple (yellow) plane is the available ZD strategy set in (G_1) ((G_2)) and the purple (yellow) vector is its normal vector. Clearly, (theta) is the angle between two normal vectors, which is also the nonzero canonical angle between the available ZD strategy set of (G_1) and it of (G_2).

Full size image

Generous strategy

By choosing generous strategies according to player Y’s cognition, player X may also dominate in the game, as reported in the following corollary.

Corollary 3

For player X’s generous strategy ({mathbf {p}}) with generous factor (chi >1) in (G_1), ({mathbf {p}}) is also a generous strategy in (G_2) if (1) and the following inequality hold:

$$begin{aligned} begin{aligned}(S_1-R_1)(R_2-P_2)-(R_1-P_1)(T_2-R_2)-chi ((T_1-R_1)(R_2-P_2)-(R_1-P_1)(T_2-R_2))<0. end{aligned}end{aligned}$$

(4)

A generous strategy ensures that the utility of the player with generous strategies is not higher than the opponent’s utility, but the player dominants in evolving games^14,33. Thus, player X’s generous strategy, whose generous factor (chi) satisfies (4) based on Y’s anticipation, can also dominate in the game in player X’s cognition. It is rational for player X to choose generous strategies which satisfy (4) since the misperception does not change their dominant positions.

Figure 3

The relation between bounds of Theorems 2 and 3 and players’ utilities in (G_2). Consider (omega _1=[T_1,R_1,P_1,S_1]=[5,3,1,0]) and (omega _2=[T_2,R_2,P_2,S_2]=[6,frac{11}{2},frac{3}{2},0]). Choose ({mathbf {p}}=alpha {mathbf {S}}^{omega _1}_{X} +beta {mathbf {S}}^{omega _1}_Y +gamma {mathbf {1}}+{mathbf {p}}_0), where ((alpha ,beta ,gamma )=(frac{1}{30},-frac{1}{6},frac{1}{4})), and ((alpha ‘,beta ‘,gamma ‘)=(frac{38}{165},-frac{94}{165},frac{151}{165})). The red lines describe the relationship between players’ utilities in (G_1). The green lines describe the bounds according to Theorems 2 and 3. Then we randomly generate 200 player Y’s strategies, and the blue circles are ((u^{omega _2}_X,u^{omega _2}_Y)), correspondingly.

Full size image

Deviation from misperception

The misperception can lead to a bounded deviation from a linear relationship between players’ expected utilities in player X’s cognition. Actually, player X chooses a ZD strategy to avoid player Y’s suspicion, but player X may not enforce a linear relationship between players’ expected utilities in its own cognition. The deviation of the utilities’ relationship is helpful for the player to implement strategies. On the one hand, players’ utilities with misperception go with a bounded deviation from a linear relationship in player X’s cognition. Let (theta) be the nonzero canonical angles³⁴ between the two available ZD strategy sets of (G_1) and (G_2), as shown in Fig. 2, and we get the following theorem.

Theorem 2

For any player X’s ZD strategy ({mathbf {p}}=alpha {mathbf {S}}^{omega _1}_{X} +beta {mathbf {S}}^{omega _1}_Y +gamma {mathbf {1}}+{mathbf {p}}_0) in (G_1), there is (alpha ‘, beta ‘, gamma ‘) such that

$$begin{aligned} |alpha ‘ u_X^{omega _2}({mathbf {p}},{mathbf {q}})+beta ‘ u_Y^{omega _2}({mathbf {p}},{mathbf {q}})+gamma ‘|leqslant ||{mathbf {p}}||_2frac{||L_2||_infty }{||L_2||_2} sintheta , forall {mathbf {q}}, end{aligned}$$

where (parallel cdot parallel _2) is the (l_2) norm, (parallel cdot parallel _infty) is the (l_infty) norm, and

$$begin{aligned} begin{aligned} theta =&arccos frac{L_1^TL_2}{||L_1||_2||L_2||_2}, L_i=[2P_i-S_i-T_i,R_i-P_i,R_i-P_i,T_i+S_i-2R_i]^T, iin {1,2}. end{aligned} end{aligned}$$

Misperception makes players’ utilities a bounded deviation from a linear relationship in player X’s cognition, that is, (alpha ‘ u_X+beta ‘ u_Y+gamma ‘=0), even though it is not maintained by choosing ZD strategies in (G_1), as shown in Fig. 3a. By recognizing the difference between (omega _1) and (omega _2), player X is able to calculate bounds of players’ utility deviation from misperception.

On the other hand, for a given strategy, the deviation from the corresponding linear relationship is also important, while Theorem 2 focuses on the deviation from an existent linear relationship in player X’s cognition. The misperception can also bring players’ utilities a bounded deviation from the corresponding linear relationship of the ZD strategy in player X’s cognition.

Theorem 3

For player X’s ZD strategy ({mathbf {p}}=alpha {mathbf {S}}^{omega _1}_{X} +beta {mathbf {S}}^{omega _1}_Y +gamma {mathbf {1}}+{mathbf {p}}_0) in (G_1), the following inequality holds in (G_2),

$$begin{aligned} min (Gamma )leqslant alpha u_X^{omega _2}({mathbf {p}},{mathbf {q}})+beta u_Y^{omega _2}({mathbf {p}},{mathbf {q}})+gamma leqslant max (Gamma ), end{aligned}$$

where

$$begin{aligned} Gamma ={(alpha +beta )(R_2-R_1),alpha (S_2-S_1)+beta (T_2-T_1),alpha (T_2-T_1)+beta (S_2-S_1),(alpha +beta )(P_2-P_1)}. end{aligned}$$

Any ZD strategy of player X based on player Y’s cognition can enforce players’ utilities subjected to a bounded deviation from the corresponding linear relationship in player X’s cognition, as shown in Fig. 3b. With a ZD strategy ({mathbf {p}}=alpha {mathbf {S}}^{omega _1}_{X} +beta {mathbf {S}}^{omega _1}_Y +gamma {mathbf {1}}+{mathbf {p}}_0), player X enforces a linear relationship in (G_1), i.e., (alpha u_X^{omega _1}({mathbf {p}},{mathbf {q}})+beta u_Y^{omega _1}({mathbf {p}},{mathbf {q}})+gamma =0). Since players’ utilites are (u_X^{omega _2}) and (u_Y^{omega _2}) in (G_2), ((u_X^{omega _2},u_Y^{omega _2})) has a bounded deviation from the corresponding relationship (alpha u_X^{omega _2}({mathbf {p}},{mathbf {q}})+beta u_Y^{omega _2}({mathbf {p}},{mathbf {q}})+gamma).

Benefit from misperception

Player X is able to take advantage of the misperception since it knows player Y’s cognition. To be specific, in IPD without misperception, for any fixed player X’s ZD strategy, its utility is influenced by the opponent’s strategy and is always in a closed interval. Player X can benefit from the misperception by choosing the strategy, which increases the supremum or the infimum of its own utility in IPD with misperception. Besides, for the three special ZD strategies, player X’s ability to improve the supremum/infimum of its own expected utility is shown in Fig. 4, and the following results show how player X chooses beneficial strategies.

Equalizer strategy

By choosing equalizer strategies according to player Y’s cognition, player X can improve the supremum of its expected utility.

Corollary 4

For player X’s equalizer strategy ({mathbf {p}}=beta {mathbf {S}}^{omega _1}_{Y}+gamma {mathbf {1}}+{mathbf {p}}_0,beta ne 0), in (G_1), the supremum of player X’s expected utility in (G_2) is larger than that in (G_1), if

$$begin{aligned} begin{aligned} a^1_i frac{gamma }{beta }>b^1_i, iin {1,2}, end{aligned} end{aligned}$$

(5)

where (a^1_i) and (b^1_i,iin {1,2}) are parameters shown in “Notations”.

Actually, when player Y chooses the always cooperate (ALLC) strategy³⁵, i.e., ({mathbf {q}}=[1,1,1,1]^T), player X gets the supremum of the expected utility in (G_1) and player X’s utility is improved in the IPD game with misperception.

Figure 4

Player X can use either equalizer strategies and extortion strategies to raise the supremum of its expected utility or generous strategies to raise the infimum of its expected utility. (a) and (b) consider that (omega _1=[T,R_1,P,S]) and (omega _2=[T,R_2,P,S]), where (R_1ne R_2); (c) considers that (omega _1=[T,R,P_1,S]) and (omega _2=[T,R,P_2,S]), where (P_1ne P_2). The red lines in (a), (b), and (c) describe utilities’ relationships when player X chooses an equalizer strategy, an extortion strategy, and a generous strategy in (G_1), respectively; The yellow area contains all possible relationships between players’ utilities in (G_2) if player X does not change its strategy. In (a) and (b), r is the supremum of player X’s utility in (G_1), and (r’) is lower than the supremum of player X’s utility in (G_2); In (c), l is the infimum of player X’s utility in (G_1), and (l’) is lower than the infimum of player X’s utility in (G_2).

Full size image

Extortion strategy

By choosing extortion strategies according to player Y’s cognition, player X can also improve the supremum of its expected utility.

Corollary 5

For player X’s extortion strategy ({mathbf {p}}) with extortion factor (chi >1) in (G_1), the supremum of player X’s expected utility in (G_2) is larger than that in (G_1) if

$$begin{aligned} begin{aligned}a^2_ichi ^2+b^2_ichi +c^2_i<0,iin {1,2}, end{aligned}end{aligned}$$

(6)

where (a^2_i,b^2_i), and (c^2_i, iin {1,2}) are parameters shown in “Notations”.

If player Y aims to maximize its own utility with great eagerness, player Y chooses the ALLC strategy when player X chooses extortion strategies⁷. In this case, by choosing the extortion strategy which satisfies (6), player X gets the supremum of the expected utility in (G_1), where player X’s utility is improved in the IPD game with misperception.

Generous strategy

By choosing generous strategies according to player Y’s cognition, player X can also improve the infimum of its expected utility.

Corollary 6

For player X’s generous strategy ({mathbf {p}}) where (chi >1), the infimum of player X’s expected utility in (G_2) is larger than that in (G_1) if

$$begin{aligned} begin{aligned}a^3_ichi ^2+b^3_ichi +c^3_i<0,iin {1,2}, end{aligned}end{aligned}$$

(7)

where (a^3_i,b^3_i), and (c^3_i,iin {1,2}) are parameters shown in “Notations”.

When player X chooses generous strategies, player Y may choose the always defect (ALLD) strategy³⁵, i.e., ({mathbf {q}}=[0,0,0,0]^T), which is the worst situation for player X since it gets the minimum expected utility in (G_1). In this case, player X is able to improve its expected utility in the worst situation.

Source: Ecology - nature.com

Misperception influence on zero-determinant strategies in iterated Prisoner’s Dilemma

Models

Invariance of ZD strategy

Theorem 1

Equalizer strategy

Corollary 1

Extortion strategy

Corollary 2

Generous strategy

Corollary 3

Deviation from misperception

Theorem 2

Theorem 3

Benefit from misperception

Equalizer strategy

Corollary 4

Extortion strategy

Corollary 5

Generous strategy

Corollary 6

A better way to separate gases

Snake-like limb loss in a Carboniferous amniote

ITALIAN LANGUAGE

ENGLISH LANGUAGE