Interpreting random forest analysis of ecological models to move from prediction to explanation

Random forest: feature importance and interactivity

Our random forests produced highly accurate predictions of local stability when trained on model output from the full dataset (e.g., AUC = 0.998 across all 5 parameters, see Fig. 2A) and all tested subsets. Running random forests on the full results set with all five parameters as predictors indicated both demographic and trophic rates were important to understanding resultant model stability. Moreover, results reveal that whether in multi-stage (red line; Fig. 2A) or single stage herbivory (e.g., ({a}_{2}) = 0, ({a}_{F}) ≥ 0; blue line Fig. 2A), parameters’ contribution to predictive power is related to their interactivity with other parameters (blue line; Fig. 2A). Note, a similar analysis with ({a}_{2}) > 0 and ({a}_{F}) = 0 is not possible because this type of herbivory is always stable.

This interactivity was apparent in our attempts to understand how our specific parameters affected the behavior of our model in Eq. (1) via studying their effects as features in driving random forest predictions. Initial investigations into individual feature effects revealed that the effect of any single feature (parameter) on trophic dynamics could change substantially based on the values of our other features (parameters). Specifically, the average marginal effects (e.g., PD plots; Fig. S3) on simulation dynamics belied a high degree of variability in feature effects throughout the simulation data (e.g., ICE plots; Fig. S3).

Breaking down results into further subsets of set specific attack rates with varying demographic rates revealed that this variability in feature effects was largely based on the changes in feature importance and effect over different allocations of herbivory on ontogenetic stages. This breakdown affected the relationship between importance and interactivity (Fig. 2A) such that it was inconsistent but still visible in aggregate across our simulation parameters (Fig. 2B,C). Figure 2D–F depict how different allocations and intensity of herbivory across plant ontogeny change the influence of each demographic parameter in driving model stability.

Given how the influence of plant demographic rates over model behavior changed across trophic allocation (Fig. 2D–F), we first focused in depth analysis on variable demographic rates across static allocations of herbivore attack rates. By limiting the number of varying features, we use multivariate analysis to develop a fuller understanding of dynamics in subsections of the data which functioned as a scaffolding for further investigation. Specifically, we took a hierarchical approach, first developing an understanding of single-stage herbivory as a basis to study single-stage dominant herbivory (Fig. 3), which then leads us to a better overall understanding of our system’s dynamics across all trophic rates.

Figure 3

Interactive feature effects on model behavior. Across different herbivory allocations, partial dependence (PD) plots (A,C,E) show interactive effects between maturation rates on categorical simulation stability. Threshold plots (B,D,F) extend this analysis to include gradations of seed production rates. (A,B) Herbivory allocation ({a}_{F}) = 1.0 and ({a}_{2}) = 0.0. (A) Partial dependence plot shows probability of stability across all values of ({r}_{F}). (B) Threshold plot shows the location of the threshold between stable and unstable dynamics in {({g}_{12}),({g}_{2F})} parameter space as a function of seed production levels (({r}_{F})). (C,D) Herbivory allocation ({a}_{F}) = 0.2 and ({a}_{2}) = 1.0. (C) Partial dependence plot shows probability of stability across all values of ({r}_{F}). (D) Threshold plot shows the location of the threshold between stable and unstable dynamics in {({g}_{12}), ({g}_{2F})} parameter space as a function of seed production levels (({r}_{F})). (E,F) Herbivory allocation ({a}_{F}) = 1.0 and ({a}_{2}) = 0.2. (E) Partial dependence plot shows probability of stability across all values of ({r}_{F}). (F) Threshold plot shows the location of the threshold between stable and unstable dynamics in {({g}_{12}), ({g}_{2F})} parameter space as a function of seed production levels (({r}_{F})).

Full size image

Single stage consumption

In the case of the seedling-only herbivore (({S}_{2}); via ({a}_{2}) > 0 and ({a}_{F}) = 0), all simulations produced stable trophic dynamics. This occurs because density loss in the seedling stage means more juveniles never reach maturity and reproduce themselves¹⁹. This essentially reduces the effective reproduction rate, limits the reproductive plant density, and decreases resources available to the herbivore (similar to lowering intrinsic reproduction in the classic Lotka–Volterra model). In fact, seedling herbivory only induced oscillations at higher handling times, a common effect of high handling time (results not shown).

On the other hand, concentrating consumption on the fecund stage ((F)) can induce both stable and oscillating trajectories (Fig. S4). Consumption of (F) does not induce the same regulation of reproductive potential that stabilizes under seedling-only consumption, and so is vulnerable to boom/bust populations cycles. We chose the two most consistently important (Fig. 2B) and interactive (Fig. 2C and Fig. S5) parameters, ({g}_{12}) and ({g}_{2F}), in order to search for dominant effects on model behavior and their interactions. These parameters functioned as focal axes for our two-dimensional PD plots³⁶. These PD plots depict the estimates of marginal effect of each parameter on random forest predictions, which in this case is categorical stability (Fig. 3A). We can see that stability estimates are increased by lowering either or both per-capita germination and/or maturation rates (({g}_{12}) and ({g}_{2F})). Demographically, reduced maturation rates shift the ratio of plant population density across its ontogeny, creating a larger juvenile population shielded from consumer pressure. Trophically, this restricts resources for the herbivore, thereby limiting losses in plant density due to herbivory (({theta }_{F})) relative to the overall plant density.

This mechanism is so influential in determining trophic dynamics, its effect on stability is statistically detectable pre-simulation via equilibrium values. Losses in plant density due to herbivory are labeled under brackets in Eq. (1) as ({theta }_{F}) and ({theta }_{2}), which we can represent as ({theta }_{F}^{*}) and ({theta }_{2}^{*}) at equilibria. Relative to overall plant density we can define a ratio for plants of consumptive losses to total density (L:D ratio) such that:

$$mathrm{L}:mathrm{D ratio}=({theta }_{F}^{*}+ {theta }_{2}^{*})/({S}_{1}^{*} +{S}_{2}^{*}+{F}^{*}).$$

(2)

When applied as a predictor variable on the same adult-herbivory subsection presented in Fig. 3A via a simple linear regression, we can see that L:D ratio alone explains ~ 45% of the variance of the maximum eigenvalue in simple linear models (F-statistic: 4578 on 1 and 5598 DF, p-value: < 2.2e−16) and produces an AUC score of ~ 0.83 when predicting categorical stability. Comparatively, our random forest using simulation parameters produced an AUC of 0.98, making it clear that our L:D ratio mechanism explains some but not all the variance in stability outcomes.

Our PD plot (Fig. 3A) shows that as both ({g}_{12}) and ({g}_{2F}) increase, predictions gradually shift from stable to unstable. Based on this observation, we can make a “threshold plot” which depicts thresholds between our categorical variables, stable and unstable behavior, as a function of a third yet unexamined parameter, which in our case is seed production, ({r}_{F}) (Fig. 3B). Plotting the thresholds between our stability categories shows a similar dynamic between ({g}_{12}) and ({g}_{2F}) as seen in the PD plot. It also reveals that the gradual changes seen in the PD plot were in fact a function of the rate of seed production, ({r}_{F}), where higher seed production supports stability at higher maturation rates. This is striking given that increased resource production is generally a destabilizing influence in the traditional Lotka-Volterra formulation. Increases in seed production are also related to increases in L:D ratio (Fig. S6), so the stabilizing effect of ({r}_{F}) must be coming from a different mechanism. Using a similar analysis on pre-simulation equilibrium values as was done with L:D ratio, we can integrate ({delta }^{*}) (see Eq. (1) and description in Table 1) into our regression given its clear connection to rF values. Doing so raises our explanatory power in our regression (R² = 0.95 in this subset of data) and our predictive power (AUC = 0.98), giving us comparable results to our random forest. However, this explains very little of the ecology given that ({delta }^{*}) is largely just the immediate realized effect of ({r}_{F}) and doesn’t describe any other changes in system behavior.

Examining the effects of ({r}_{F}) on the rest of our system shows that ({r}_{F}) is also positively correlated with functions ({gamma }_{12}^{*}) and ({gamma }_{2F}^{*}) (see definition and description of these functions in Eq. (1) and Table 1, respectively; Fig. S6). This implies a stabilizing effect of high densities of maturing seedlings, which seems unintuitive given the stabilizing effect of lowering baseline maturation rates, ({g}_{12}) and ({g}_{2F}). The key difference is the mechanistic difference between increasing ({gamma }_{12}^{*}) and ({gamma }_{2F}^{*}) via baseline maturation rates (({g}_{12}) and ({g}_{2F})) vs increasing overall seedling density via increases in seed production (({r}_{F})) (see Appendix S3.2 for more details). Increasing ({gamma }_{12}^{*}) and ({gamma }_{2F}^{*}) via baseline maturation rates (({g}_{12}) and ({g}_{2F})) changes the density distribution ratio in favor of (F) (Fig. S7a), inducing boom-bust cycles and oscillations as more plant individuals move into the adult stage and are consumed. On the other hand, increased seed production (({r}_{F})) increases overall plant density, which increases density dependent limitations on maturation, shifts the range of potential density distributions in the plant population, and saturates the younger stages, ({S}_{1}) and ({S}_{2}) (Fig. S7b). In adult-only herbivory, these saturated stages face no consumer pressure and therefore act as a more immediate reservoir to replace adults lost to consumption by (H). The functions ({gamma }_{12}^{*}) and ({gamma }_{2F}^{*}) increase because there are simply more seeds and seedlings maturing. Therefore, even though increasing plant density may lead to higher overall consumption, the reservoir of density in younger stages titrates into adult stages due to consumption, raises the trough of oscillations as ({r}_{F}) (and subsequently ({gamma }_{12}^{*}) and ({gamma }_{2F}^{*})) increases (Fig. S8) and ultimately leads to a stabilizing consumer dynamic.

This description offers a fuller ecological explanation of how the distribution of density across plant stages mediates herbivore resource availability and drives the observed parameter context-dependence. Specifically, we observe two ways that shifting internal plant demography can promote stability. First, we observe that lowering the average per-capita maturation rates (({g}_{12}) and ({g}_{2F})) shields plant density in younger stages. This sequestration of plant density stabilizes by directly restricting resource availability for the herbivore and preventing overconsumption. Second, we observe that raising the intrinsic seed production rate (({r}_{F})) saturates plant density across the plant population which results in increased resource availability for the herbivore. Despite this increase in resource availability, the system is stabilized by density-dependent limitations on maturation and a robust supply of replacement plant density in younger stages which prevents overconsumption of the adult stage. The observation that increasing resource availability for the herbivore can be both stabilizing (via ({r}_{F})) or destabilizing (via ({g}_{12}),({g}_{2F})) depending on the specific parameters underlies much of the parameter context dependence detected by our initial random forest (Fig. 2A).

Integrating the ({gamma }_{12}^{*}) and ({gamma }_{2F}^{*}) factors into our earlier linear regression analysis (with partial least regression due to collinearity between our variables) shows that our three factors (L:D ratio, ({gamma }_{12}^{*}) and ({gamma }_{2F}^{*})) explain 91% of the variance of the maximum eigenvalue (from Fig. 3A,B) and matches our expected direction of effect (shown in Fig. 3A). Predictive power increased as well (AUC: 0.98 for categorical stability; RMSE 0.003 for regression on max eigenvalues).

Multi-stage consumption

Having established baseline dynamics, we move into multi-staged consumption by supplementing single-stage consumption with ancillary consumption on the complementary stage. Specifically, we start by supplementing a seedling-oriented herbivore with limited attack on adults (({a}_{2}) = 1 and ({a}_{F}) = 0.2) while revealing the importance of interactivity in parameter effects on understanding model behavior. Using this subset of simulation data, we trained a random forest and tested its predictive accuracy. As expected, random forests are sufficiently flexible to maintain high predictive accuracy despite these changes (AUC: 0.98 for categorical stability; RMSE: 0.0001 for regressions of max eigenvalue).

Simulation results reveal that the stability of the seedling-only consumer is vulnerable to destabilization from even limited multi-stage consumption (Fig. 3C,D). PD plots offer some ecological explanation in showing that oscillations can still be stabilized by restrictions in maturation rates (Fig. 3C). Extending this analysis with our threshold plots we see that the effect of ({r}_{F}) has effectively been reversed (Fig. 3D). Overall then, the slight addition of adult consumption institutes a relationship between ({g}_{12}) and ({g}_{2F}) that mimics an adult-only herbivore but with the caveat that higher ({r}_{F}) values require substantially more restricted maturation to stabilize dynamics.

We can explain this result using our earlier ecological factors (L:D ratio, ({gamma }_{12}^{*}), ({gamma }_{2F}^{*})). Compared to the adult-only herbivore, we can see that in the seedling-dominant herbivore, raising ({r}_{F}) has much higher proportional effect on L:D ratio compared to ({gamma }_{12}^{*}) and ({gamma }_{2F}^{*}) (Fig. S9). This is because L:D ratio now consists of consumption on both stages and the seedling stage can no longer act as a saturating reservoir with increased seed production. Additionally, increasing either ({gamma }_{12}^{*}) or ({gamma }_{2F}^{*}) via ({r}_{F}) (higher density) induces higher cumulative attacks ((theta) values; see Eq. (1) and Table 1) on both stages. This becomes clear when we regress our ecological factors on our eigenvalue data from this subset of herbivory data. We see that the effects of ({gamma }_{12}^{*}) and ({gamma }_{2F}^{*}) have switched from stabilizing to destabilizing (i.e., raising the max eigenvalue; Fig. 4A). Despite these changes, our linear model using our ecological factors still performs comparatively well to our random forest predictions (AUC: 0.99 for categorical stability; RMSE 0.002 (R² = 0.97) for regression on max eigenvalues).

Figure 4

Ecological drivers of stability. Ecological factor effects on stability via coefficients from partial least squares regression of ecological factors versus maximum eigenvalue. Regressions are run for multiple herbivory allocations. (A) Bar graph showing specific changes in effect on maximum eigenvalue (regression coefficients) for each ecological factor across the four specific herbivory allocations investigated in-depth in the “Results”. (B–D) Range of effects for each ecological factor on stability shown via heatmaps of regressions coefficients on maximum eigenvalue across all specific combinations of herbivory on the adult and seedling stages. Note, the gray square when both attack rates are set to 0 indicates no data given the lack of consumption.

Full size image

We tested this further by supplementing an adult-oriented herbivore with a limited attack on the seedling stage (({a}_{2}) = 0.2, ({a}_{F}) = 1). Once again, we trained and validated a random forest on this subset of herbivory data and once again the random forest proved adept in providing accurate predictions (AUC: 0.98 for categorical stability; RMSE: 0.0002 for regressions of max eigenvalue). Similar to before, the addition of only slight amounts of multi-stage consumption qualitatively changes the relationships amongst model parameters (input features) and stability. PD plots show that lower ({g}_{12}) is again stabilizing. However, it also reveals stability at lower ({g}_{12}) values is now more dependent on higher ({g}_{2F}) (Fig. 3E). Extending the analysis with our threshold plots again indicates that higher seed production values (({r}_{F})) limit stable {({g}_{12}), ({g}_{2F})} parameter space (Fig. 3F).

Similar to seedling-oriented herbivory, this multi-stage consumption limits the function of saturating reservoirs in the seedlings and induces more instances of oscillatory dynamics compared to single stage consumption on the fecund stage. However, using our ecological factors, we can investigate the demographic conditions which still promote stability. The stability found at high ({g}_{2F}) and low ({g}_{12}) can also be described as high ({gamma }_{2F}^{*}) and low ({gamma }_{12}^{*}) with exact values contingent upon the seed production rate (Fig. S10a). These demographic conditions reduce the composition of plants in the seedling stage. This limits herbivore consumption on the seedlings (Fig. S10b) and causes the interaction to function more like single stage consumption. This functional similarity to single stage consumption on the adult stage means this promotes the stabilizing effect of a reservoir in the seed bank (low ({g}_{12})) and a high replacement rate of adults (high ({g}_{2F})). Consequently, simulation results show higher rates of stability at these conditions (Fig. S10c) and the coefficients from our partial least squares regression correspond with our analysis (Fig. 4A). Additionally, our ecological factors once again not only aid in explaining our results but also perform well as predictors (AUC: 0.99 for categorical stability; RMSE: 0.003 (R² = 0.99) for regressions of max eigenvalue).

In fact, across all of unique herbivory allocations, our ecological factors perform well as predictors in both categorical (mean AUC: 0.99) and regression based (mean RMSE: 0.002) linear models. The accuracy of our predictions, though not equivalent to causation, affords some confidence in using our partial least squares regression coefficients (on max eigenvalues) to investigate how the effects of each ecological factor change across different allocations of herbivory (Fig. 4B–D). With this we get a detailed view of how plant demography interacts with trophic rates to drive dynamics of our trophic interaction. These results were generally qualitatively consistent across handling times and strength of density dependence (see Table 1) with a notable exception when handling times for seedling consumption are smaller than for adult consumption (see Fig. S11).

Finally, expanding back out and considering the full simulation dataset reveals our ecological factors demonstrate predictive accuracy even across all demographic and trophic rates. Using our linear partial least squares model on maximum eigenvalues showed modest success (mean RMSE = 0.012 across all permutations of handling time and density dependence). Categorically predicting stability using our factors and the binomial regression was comparatively more successful (mean AUC = 0.91 across all permutations of handling time and density dependence).

Source: Ecology - nature.com

Interpreting random forest analysis of ecological models to move from prediction to explanation

Random forest: feature importance and interactivity

Single stage consumption

Multi-stage consumption

Nanotube sensors are capable of detecting and distinguishing gibberellin plant hormones

Study: Smoke particles from wildfires can erode the ozone layer

ITALIAN LANGUAGE

ENGLISH LANGUAGE