### Agent-resource affiliation networks

We consider games involving populations of *agents* that extract from multiple common-pool *sources* (which term we use for nodes representing resources in accord with previous related work^{15,16}). Agents’ access to sources is defined by bipartite networks, wherein a link between an agent and a source indicates that the agent can access that source. This access is determined by some exogenous factors and remains fixed in time. The set of agents affiliated with a particular source (s) is denoted as ({mathbf{A}}_{s}), while the set of sources affiliated with a particular agent (a) is denoted as ({mathbf{S}}_{a}). The degree of an agent (a) is denoted by (m(a)), and the degree of a source (s) by (n(s)).

To explore the effects of network topology upon extraction dynamics and wealth distributions, we generate ensembles of ({10}^{3}) networks, each having (50) agents and (50) sources and sharing mean agent degree (langle mrangle =5) and mean source degree (langle nrangle =5). All networks thus share the same total numbers of agents, sources, and links, but differ in how these links are distributed among agents and sources. We generate 9 network ensembles, each generated to represent a particular combination of one of three types of degree heterogeneity in its source degree distribution (**U**: uniform-degree, **L**: low-heterogeneity, or **H**: high-heterogeneity) with one of three similar distributions of agent degree (**u**, **l**, or **h**^{39}) (Supplementary Information S1.1). Degree histograms, averaged over each ensemble, provide a representative *source degree distribution* ({P}_{mathbf{S}}(n)) and *agent degree distribution* ({P}_{mathbf{A}}(m)) for each network type (Fig. 2a and b). It is worth noting that the results of the simulations depend primarily on the degree distributions of agents and sources rather than on the overall size of the networks used (Supplementary Information S3.1).

### Networked CPR extraction game

On these networks, we simulate iterative games in which agents vary the extraction effort that they apply to their affiliated sources, altering the quality of these sources; in turn, these changes in source quality then influence how agents adapt their extraction levels in subsequent rounds. The ** extraction effort** exerted by agent (a) upon its affiliated source (s) is denoted as (q(a,s)). The total effort exerted by an agent (a), its

**is denoted by (overleftarrow{q}left(aright)=sum_{sin {mathbf{S}}_{a}}q(a,s)). The total effort exerted upon source (s), or its**

*individual extraction,***, is denoted by (overrightarrow{q}left(sright)=sum_{ain {mathbf{A}}_{s}}q(a,s)). The**

*collective extraction***of a source (s) is quantified by the benefit (b(s)) per unit extraction effort applied that the source provides. The cost associated with extraction is given by a convex (quadratic) function of (overleftarrow{q}left(aright)), such that marginal costs increase with individual extraction**

*quality*^{15,16}. In addition to modelling the increasing costs (i.e., diminishing returns) associated with the physical act of extraction itself, this could also reflect escalating, informal social penalties that result from increasing extraction (i.e., “graduated sanctions”

^{1,40}). The net

**accumulated by an agent (a) in a game iteration is thus**

*payoff*$$fleft( a right) = left[ {mathop sum limits_{{s in {mathbf{S}}_{a} }} qleft( {a,s} right) cdot bleft( s right)} right] – frac{gamma }{2}{ }mathop{q}limits^{leftarrow} left( a right)^{2} ,$$

(1)

where (gamma) is a positive cost parameter.

### Bistable model of CPR depletion and remediation

Sources are *bistable*, meaning that at any time they can occupy one of two states: (1) a ** viable state**, during which the source provides a benefit of magnitude (alpha) in return for each unit of extraction effort, and (2) a

**, during which this benefit is reduced by (beta) ((0<beta le alpha)). Sources immediately transition from viable to depleted states when the collective extraction surpasses a**

*depleted state***({overrightarrow{q}}_{mathrm{D}}(s)). Depleted sources may then transition to active states again when collective extraction falls below a**

*depletion threshold***({overrightarrow{q}}_{mathrm{R}}(s)) (Fig. 1a). Source quality is thus given by**

*remediation threshold*$$bleft( s right) = alpha – beta chi left( s right),$$

(2)

where (chi left( s right) = 0) for viable sources and (chi left( s right) = 1) for depleted sources. The state (chi left( s right)) of a source thus evolves such that at an iteration (t) (we index iterations by subscripts where relevant) by

$$chi_{t} left( s right) = left{ {begin{array}{lll} {1, } & {{text{if }} chi_{t – 1} left( s right) = 0{text{ and }}vec{q}_{t} left( s right) > vec{q}_{{text{D}}} left( s right)} {0, } & {{text{if }} chi_{t – 1} left( s right) = 1{text{ and }}vec{q}_{t} left( s right) le vec{q}_{R} left( s right)} {chi_{t – 1} left( s right),} & {text{otherwise }} end{array} } right.$$

(3)

In the results that follow, we focus upon a *uniform capacity* scenario, wherein all sources share identical threshold values (vec{q}_{{text{D}}} left( s right) equiv vec{q}_{{text{D}}}) and (vec{q}_{{text{R}}} left( s right) equiv vec{q}_{{text{R}}} left( s right)). An alternative *degree-proportional capacity* scenario, in which threshold values increase with source degree, is discussed in the Supplementary Information (S3.4.2).

### Free adaptation

Under the free adaptation strategy, an agent updates its extraction levels independently at each of its affiliated sources depending on the state of each (Fig. 1b). As in the replicator rule often applied in networked evolutionary game models^{17,41,42}, the rate at which an agent adapts its extraction levels within a time interval ({Delta }t) is proportional to the marginal payoff that the agent expects to attain thereby:

$$frac{{{Delta }qleft( {a,s} right)}}{{{Delta }t}} = kfrac{partial fleft( a right)}{{partial qleft( {a,s} right)}},$$

(4)

where (k) is a rate constant. So, each extraction level (qleft( {a,s} right)) is updated according to

$$q_{t + 1} left( {a,s} right) = q_{t} left( {a,s} right) + kleft[ {alpha – beta chi_{t} left( s right) – gamma {mathop{q}limits^{leftarrow}}_{t} left( a right)} right].$$

(5)

The higher an agent’s individual extraction (overleftarrow{q}(a)), the more slowly it will increase its extraction from viable sources, and the more rapidly it will reduce its extraction from depleted sources.

### Uniform adaptation

When applying the uniform adaptation strategy, an agent adjusts each of its extraction levels by the same magnitude (Delta qleft(a,sright)equivDelta overleftarrow{q}(a)/mleft(aright)) (Fig. 1c). Assuming again that the rate at which an agent enacts this update is proportional to the associated marginal payoff, an agent adapts its extraction levels at all of its affiliated sources (s) by

$$q_{t + 1} left( {a,s} right) = q_{t} left( {a,s} right) + kleft[ {alpha – beta overline{chi }left( a right) – gamma {mathop{q}limits^{leftarrow}}_{t} left( a right)} right],$$

(6)

where (overline{chi }left( a right) = left[ {mathop sum nolimits_{{s^{prime} in {mathbf{S}}_{a} }} chi left( {s^{prime}} right)} right]/mleft( a right)) is the mean state of the agent’s affiliated sources.

### Reallocation

When practicing reallocation, an agent shifts an increment of extraction effort from a depleted source to a viable source such that its overall individual extraction (mathop{q}limits^{leftarrow} left( a right)) remains unchanged (Fig. 1d). The agent thus randomly selects one depleted source (s_{{text{D}}} in {mathbf{S}}_{a}) and one viable source (s_{{text{V}}} in {mathbf{S}}_{a}), if available. Since the marginal payoff per unit reallocated is (beta), updates its extraction levels such that

$$q_{t + 1} left( {a,s} right) = left{ {begin{array}{*{20}c} {q_{t} left( {a,s} right) – kbeta , } & {{text{if}} s = s_{{text{D}}} } {q_{t} left( {a,s} right) + kbeta , } & {{text{if}} s = s_{{text{V}}} } {q_{t} left( {a,s} right),} & {text{otherwise }} end{array} } right.$$

(7)

When an agent’s affiliated sources all share the same quality value, no such reallocation is possible, and so the agent retains its present extraction levels: (q_{t + 1} left( {a,s} right) = q_{t} left( {a,s} right)) for all (s in {mathbf{S}}_{a}).

### Mixed strategies

An agent’s ** adaptation strategy** (({p}_{0},{p}_{updownarrow },{p}_{leftrightarrow })) comprises the probabilities that it will practice each of these update rules in any given round: its

*free adaptation propensity*(({p}_{0})), its

*uniform adaptation propensity*(({p}_{updownarrow })), and its

*reallocation propensity*(({p}_{leftrightarrow })). An agent’s choice of a particular update rule is thus based only on its own innate inclinations, but the

*rate*at which it enacts the selected rule is influenced by current resource conditions. We first simulate dynamics in which the same adaptation strategy is shared by all members of a population throughout the entire course of a simulation. We then consider games in which agents’ individual adaptation strategies are each allowed to independently evolve under generalized reinforcement learning

^{38,43}(Supplementary Information S1.3.4). That is, after enacting a chosen update rule in an iteration (t), each agent (a) observes the payoff change (Delta {f}_{t}left(aright)={f}_{t}left(aright)-{f}_{t-1}(a)). If (Delta {f}_{t}left(aright)>0), then the agent’s relative propensity to practice this update rule in subsequent rounds is increased. If the agent’s payoffs decreased ((Delta {f}_{t}left(aright)<0)), then its propensity to apply the given update rule is reduced accordingly.

### Simulations

In simulations of the CPR extraction game, each iteration involves the following steps:

- 1.
**Agents collect payoffs**based on current extraction levels and resource conditions (Eq. 1). - 2.
Agents are randomly selected for update, each with probability (u).

- 3.
Each updating agent

**select an update rule**based on its adaptation strategy (({p}_{0} ,{p}_{updownarrow },{p}_{leftrightarrow })). - 4.
Each updating agent

**adjusts its extraction levels**in accord with the chosen update rule (Eqs. 5–7). - 5.
If applicable, updating agents

**adjust their adaptation strategies**using reinforcement learning. - 6.
**The state**({varvec{chi}}({varvec{s}}))**of each source is updated**based on the new extraction levels (Eq. 3).

Although the update rules of Eqs. (5–7) deterministically specify the changes an agent will make when enacting a certain update rule under a given set of resource conditions (Step 4), each iteration involves randomness in the set of agents that update (Step 2), in the selections of update rules by each updating agent (Step 3), and in the choices of sources involved in reallocation moves (Eq. 7).

In the simulation results presented below, all sources are set to share (alpha =beta =1), ({overrightarrow{q}}_{mathrm{D}}=1), and ({overrightarrow{q}}_{mathrm{R}}=.001), and initial state (chi left(sright)=0). All agents share (u=.5), (k=.02), and (gamma =0.2) unless otherwise noted. The parameter settings considered here ((beta =alpha)) represent complete, catastrophic resource depletion. This choice simplifies our analyses by ensuring that agents practicing free adaptation (Eq. 6) will always reduce their extraction from a depleted source and eventually trigger its remediation, regardless of the choice of the cost parameter (gamma >0) or remediation threshold ({overrightarrow{q}}_{mathrm{R}}left(sright)). That is, all resource depletion events are assumed to be extreme enough to motivate agents to continuously “self-regulate” by remediating depleted sources (see Supplementary Information S5 for a more thorough discussion of these parameter settings).In simulations where reinforcement learning is applied, all agents are initialized with ({p}_{updownarrow }={p}_{leftrightarrow }=.333). For pure free adaptation simulations (({p}_{0}=1)), initial extraction levels were randomized (({q}_{t=0}left(a,sright)in [0,frac{{overrightarrow{q}}_{mathrm{D}}left(sright)}{nleft(sright)}])). All other simulations (({p}_{0}<1)) were then initialized by setting each agent’s individual extraction to its average value from the free adaptation simulation, ({overline{overleftarrow{q }left(aright)}}_{0}), allocated equally among its affiliated sources (({{q}_{t=0}(a,s)=overline{overleftarrow{q }left(aright)}}_{0}/m(a))). Simulations were iterated through ({10}^{5}) steps, and time-averaged quantities were computed over the final (8times {10}^{4}) iterations: e.g., (overline{q }(a,s)=left(frac{1}{8times {10}^{4}}right){sum }_{t=2times {10}^{4}}^{{10}^{5}}{q}_{t}(a,s)). This duration was chosen based on inspection of simulation results to ensure that free adaptation extraction levels would have settled into steady ranges, and that mixed-strategy and reinforcement learning dynamics (where applicable) would have approached their stable mean values, after an initial transient period.

Source: Ecology - nature.com