Improving biodiversity protection through artificial intelligence
A biodiversity simulation frameworkWe have developed a simulation framework modelling biodiversity loss to optimize and validate conservation policies (in this context, decisions about data gathering and area protection across a landscape) using an RL algorithm. We implemented a spatially explicit individual-based simulation to assess future biodiversity changes based on natural processes of mortality, replacement and dispersal. Our framework also incorporates anthropogenic processes such as habitat modifications, selective removal of a species, rapid climate change and existing conservation efforts. The simulation can include thousands of species and millions of individuals and track population sizes and species distributions and how they are affected by anthropogenic activity and climate change (for a detailed description of the model and its parameters see Supplementary Methods and Supplementary Table 1).In our model, anthropogenic disturbance has the effect of altering the natural mortality rates on a species-specific level, which depends on the sensitivity of the species. It also affects the total number of individuals (the carrying capacity) of any species that can inhabit a spatial unit. Because sensitivity to disturbance differs among species, the relative abundance of species in each cell changes after adding disturbance and upon reaching the new equilibrium. The effect of climate change is modelled as locally affecting the mortality of individuals based on species-specific climatic tolerances. As a result, more tolerant or warmer-adapted species will tend to replace sensitive species in a warming environment, thus inducing range shifts, contraction or expansion across species depending on their climatic tolerance and dispersal ability.We use time-forward simulations of biodiversity in time and space, with increasing anthropogenic disturbance through time, to optimize conservation policies and assess their performance. Along with a representation of the natural and anthropogenic evolution of the system, our framework includes an agent (that is, the policy maker) taking two types of actions: (1) monitoring, which provides information about the current state of biodiversity of the system, and (2) protecting, which uses that information to select areas for protection from anthropogenic disturbance. The monitoring policy defines the level of detail and temporal resolution of biodiversity surveys. At a minimal level, these include species lists for each cell, whereas more detailed surveys provide counts of population size for each species. The protection policy is informed by the results of monitoring and selects protected areas in which further anthropogenic disturbance is maintained at an arbitrarily low value (Fig. 1). Because the total number of areas that can be protected is limited by a finite budget, we use an RL algorithm42 to optimize how to perform the protecting actions based on the information provided by monitoring, such that it minimizes species loss or other criteria depending on the policy.We provide a full description of the simulation system in the Supplementary Methods. In the sections below we present the optimization algorithm, describe the experiments carried out to validate our framework and demonstrate its use with an empirical dataset.Conservation planning within a reinforcement learning frameworkIn our model we use RL to optimize a conservation policy under a predefined policy objective (for example, to minimize the loss of biodiversity or maximize the extent of protected area). The CAPTAIN framework includes a space of actions, namely monitoring and protecting, that are optimized to maximize a reward R. The reward defines the optimality criterion of the simulation and can be quantified as the cumulative value of species that do not go extinct throughout the timeframe evaluated in the simulation. If the value is set equal across all species, the RL algorithm will minimize overall species extinctions. However, different definitions of value can be used to minimize loss based on evolutionary distinctiveness of species (for example, minimizing phylogenetic diversity loss), or their ecosystem or economic value. Alternatively, the reward can be set equal to the amount of protected area, in which case the RL algorithm maximizes the number of cells protected from disturbance, regardless of which species occur there. The amount of area that can be protected through the protecting action is determined by a budget Bt and by the cost of protection ({C}_{t}^{c}), which can vary across cells c and through time t.The granularity of monitoring and protecting actions is based on spatial units that may include one or more cells and which we define as the protection units. In our system, protection units are adjacent, non-overlapping areas of equal size (Fig. 1) that can be protected at a cost that cumulates the costs of all cells included in the unit.The monitoring action collects information within each protection unit about the state of the system St, which includes species abundances and geographic distribution:$${S}_{t}={{{{H}}}_{{{t}}},{{{D}}}_{{{t}}},{{{F}}}_{{{t}}},{{{T}}}_{{{t}}},{{{C}}}_{{{t}}},{{{P}}}_{{{t}}},{B}_{t}}$$
(1)
where Ht is the matrix with the number of individuals across species and cells, Dt and Ft are matrices describing anthropogenic disturbance on the system, Tt is a matrix quantifying climate, Ct is the cost matrix, Pt is the current protection matrix and Bt is the available budget (for more details see Supplementary Methods and Supplementary Table 1). We define as feature extraction the result of a function X(St), which returns for each protection unit a set of features summarizing the state of the system in the unit. The number and selection of features (Supplementary Methods and Supplementary Table 2) depends on the monitoring policy πX, which is decided a priori in the simulation. A predefined monitoring policy also determines the temporal frequency of this action throughout the simulation, for example, only at the first time step or repeated at each time step. The features extracted for each unit represent the input upon which a protecting action can take place, if the budget allows for it, following a protection policy πY. These features (listed in Supplementary Table 2) include the number of species that are not already protected in other units, the number of rare species and the cost of the unit relative to the remaining budget. Different subsets of these features are used depending on the monitoring policy and on the optimality criterion of the protection policy πY.We do not assume species-specific sensitivities to disturbance (parameters ds, fs in Supplementary Table 1 and Supplementary Methods) to be known features, because a precise estimation of these parameters in an empirical case would require targeted experiments, which we consider unfeasible across a large number of species. Instead, species-specific sensitivities can be learned from the system through the observation of changes in the relative abundances of species (x3 in Supplementary Table 2). The features tested across different policies are specified in the subsection Experiments below and in the Supplementary Methods.The protecting action selects a protection unit and resets the disturbance in the included cells to an arbitrarily low level. A protected unit is also immune from future anthropogenic disturbance increases, but protection does not prevent climate change in the unit. The model can include a buffer area along the perimeter of a protected unit, in which the level of protection is lower than in the centre, to mimic the generally negative edge effects in protected areas (for example, higher vulnerability to extreme weather). Although protecting a disturbed area theoretically allows it to return to its initial biodiversity levels, population growth and species composition of the protected area will still be controlled by the death–replacement–dispersal processes described above, as well as by the state of neighbouring areas. Thus, protecting an area that has already undergone biodiversity loss may not result in the restoration of its original biodiversity levels.The protecting action has a cost determined by the cumulative cost of all cells in the selected protection unit. The cost of protection can be set equal across all cells and constant through time. Alternatively, it can be defined as a function of the current level of anthropogenic disturbance in the cell. The cost of each protecting action is taken from a predetermined finite budget and a unit can be protected only if the remaining budget allows it.Policy definition and optimization algorithmWe frame the optimization problem as a stochastic control problem where the state of the system St evolves through time as described in the section above (see also Supplementary Methods), but it is also influenced by a set of discrete actions determined by the protection policy πY. The protection policy is a probabilistic policy: for a given set of policy parameters and an input state, the policy outputs an array of probabilities associated with all possible protecting actions. While optimizing the model, we extract actions according to the probabilities produced by the policy to make sure that we explore the space of actions. When we run experiments with a fixed policy instead, we choose the action with highest probability. The input state is transformed by the feature extraction function X(St) defined by the monitoring policy, and the features are mapped to a probability through a neural network with the architecture described below.In our simulations, we fix monitoring policy πX, thus predefining the frequency of monitoring (for example, at each time step or only at the first time step) and the amount of information produced by X(St), and we optimize πY, which determines how to best use the available budget to maximize the reward. Each action A has a cost, defined by the function Cost(A, St), which here we set to zero for the monitoring action (X) across all monitoring policies. The cost of the protecting action (Y) is instead set to the cumulative cost of all cells in the selected protection unit. In the simulations presented here, unless otherwise specified, the protection policy can only add one protected unit at each time step, if the budget allows, that is if Cost(Y, St) More