Continent-wide tree fecundity driven by indirect climate effects
Elements of TA
Identifying biogeographic trends within volatile data required several innovations in the MASTIF model20, building from multivariate state-space methods in previous applications41,52. Standard modeling options, such as generalized linear models and their derivatives, do not accommodate key features of the masting processes. First, multiple data types are not independent. Maturation status is binary with detection error, CCs are non-negative integers, also with detection error, and STs require a transport model (dispersal) linking traps to trees, and identification error in seed identification. Of course, a tree observed to bear seed, now or in the past, is known to be mature now. However, failure to observe seed does not mean that an individual is immature because there are detection errors and failed crop years41,64.
Second, seed production is quasiperiodic within an individual (serial dependence), quasi-synchronous between individuals (“mast years”), [and] there is dependence on environmental variation, and massive variation within and between trees41,53,65. Autoregressive error structures (AR(p) for p lag terms) impose a rigid assumption of dependence that is not consistent with quasiperiodic variation that can drift between dominant cycles within the same individual over time43. It does not allow for individual differences in mast periodicity.
Third, climate variables that affect fecundity operate both through interannual anomalies over time and as [a] geographic variation. The masting literature deals almost exclusively with the former, but our application must identify the latter: the potentially smooth variation of climate effects across regions must be extracted from the many individual time series, each dominated by local “noise.”
Finally, model fitting is controlled by the size classes that dominate a given site and thus is insensitive to size classes that are poorly represented. Large trees are relatively rare in eastern forests, making it hard to identify potential declines in large, old individuals41,53. Conversely, the shade-intolerant species that dominate second-growth forests often lack the smaller size classes needed to estimate maturation and early stages where fecundity may be increasing rapidly.
Several of the foregoing challenges are resolved in the MASTIF model by introducing latent states for individual maturation status and tree-year seed production. The dependent data types (maturation status, CCs, STs) become conditionally independent in the hierarchical MASTIF model (e.g., ref. 66). The serial dependence is handled as a conditional hidden Markov process for maturation that combines with CCs and STs by way of stochastic (latent) conditional fecundity. Maturation status and conditional fecundity must be estimated jointly, that is, not with separate models. The latent maturation/fecundity treatment avoids imposing a specific AR(p) structure. In the MASTIF model there is a posterior covariance in maturation/fecundity across all tree-year estimates that need not adhere to any specific assumption20. The dependence across individuals and years is automatic and available from the posterior distribution.
Separating the spatial from temporal components of climate effects is possible here, not only because the entire network is analyzed together but also because predictors in the model include both climate norms for the individual sites and interannual anomalies across sites35,52. TA depends on both of these components.
Extracting the trends from volatile data further benefits from random individual effects for each tree and the combination of climate anomalies and year effects over time. A substantial literature focuses on specific combinations of climate variables that best explain year-to-year fecundity variation, including combinations of temperature, moisture, and water balance during specific seasons over current and previous years19,25,41. Results vary for each study, presumably due to the differences in sites, species, size classes, duration, data type, and modeling assumptions. For TA, the goal is to accommodate the local interannual variation to optimize identification of trends in space and time. Thus, we include a small selection of important climate anomalies (spring minimum T of the current year, summer T of the current and previous year, and moisture D of the current and previous year). The climate anomalies considered here do not include every variable combination that could be important for all size classes of every species on every site. For this reason, we combine climate anomalies with year effects. Year effects in the model are fixed effects within an ecoregion and random between ecoregions (ecoregions are shown in Fig. 2 and listed in Supplementary Data 2). They are fixed within an ecoregion because they are not interpreted as exchangeable and drawn at random from a large population of possible years. They are random between ecoregions due to the uneven distribution of sites (Supplementary Data 1)20.
To optimize inference on size effects, the sampling of coefficients in posterior simulation is implemented as a weighted regression. This means that the contribution of tree diameter to fecundity is inversely proportional to the abundance of that size class in the data. This approach has the effect of balancing the contributions of abundant and rare sizes. Identifying size effects further benefits from the introduction of opportunistic field sampling, which can target the large individuals that are typically absent from field study plots.
MASTIF data network
Data included in the analysis come from published and unpublished sources and offer one or both of the two data types, CCs and STs (Supplementary Data 1). Both data types inform tree-year fecundity; they are plotted by year in Fig. 6.
Fig. 6: Distribution of observation trees by year in the North American region of the MASTIF network.
Sites are listed by ecoregion in the Supplementary Data 2.
Full size image
CCs in the MASTIF network are obtained by one of three methods. Most common are counts with binoculars that are recorded with an estimate of the fraction of the crop that was observed. A second CC method makes use of seeds collected per ground surface area relative to the crown area. This method is used where conspecific crowns are isolated and wind dispersal is limited. The crop fraction is the ratio of ground area for traps relative to the projected crown area. Examples include HNHR67 and BCEF68.
A third CC method is based on evidence for past cone production that is preserved on trees. This has been used for Abies balsamea at western Quebec sites69, Pinus ponderosa in the Rocky Mountains70, and for Pinus edulis at SW sites27.
ST data include observations on individual trees that combine with seed counts from traps. Because individual studies can report different subcategories of seeds, and few conduct rigorous tests of viability, we had to combine them using the closest description to the concept of “viable”. For example, we do not include empty conifer seeds. A dispersion model provides estimates of seeds derived from each tree. ST and CC studies are listed in Supplementary Data 1. The likelihoods for CCs and STs are detailed in ref. 20. Individually and in combination, the two data types provide estimates, with full uncertainty, for fecundity across all tree-years.
Fitted species had multiple years of observations from multiple sites, which included 211,146 trees and 2,566,594 tree-years from 123 species. Sites are shown in Fig. 2 of the main text by ecoregion, they are named in Fig. 1 and summarized in Supplementary Data 1. For TA the fits were applied to 7,723,671 trees on inventory plots. Mean estimates for the genus were used for inventory trees belonging to species for which there were not confident fits in the MASTIF model, which amounted to 7.2% of inventory trees. Detailed site information is available at the website MASTIF.
Covariates
Covariates in the model include as main effects tree diameter, tree canopy class (shading), and the climate variables in Fig. 1 of the main text and described in Table 1. A quadratic diameter term in the MASTIF model allows for changes in diameter response with size52. Shade classes follow the USDA Forest Inventory and Analysis (FIA)/National Ecological Observation Network (NEON) scheme that ranges from a fully exposed canopy that does not interact with canopies of other trees to fully shaded in the understory. Shading provides information on competition that has proved highly significant for fecundity in previous analyses41,52.
Table 1 Predictors in the model, not all of which are important for all species.
Full size table
To distinguish between the effects of spatial variation versus interannual variability, spring T and moisture D are included in the model as site means and site anomalies35. Spring minimum T affect phenology and frost risk during flowering and early fruit initiation. Summer mean T (June–August) is included both as a linear and quadratic term. Mean summer T is linked to thermal energy availability during the growing season, with the quadratic term allowing for potential suppression due to extreme heat. Moisture D (cumulative monthly PET-P (potential evapotranspiration[-] minus precipitation) for January–August) is included as a site mean and an annual anomaly. Moisture D is important for carbon assimilation and fruit development during summer in the eastern continent and, additionally, from the preceding winter in the western continent. For species that develop over spring and summer, anomalies incorporate the current and previous year. We did not include longer lags in covariates. For species that disperse seed in spring (Ulmus spp. and some members of Acer), only the previous year was used. Temperature anomalies were included for spring, but not summer, simply to reduce the number of times that temperature variables enter the model, and these two variables tended to be correlated at many sites.
Climate covariates were derived from gridded climate products and combined with local climate monitoring where it is available. Terraclimate71 provides monthly resolution, but it is spatially coarse. For both norms and trends, we used the period from 1990 to 2019 because global temperatures have been increasing consistently since the 1980s, and this span broadly overlaps with fecundity data (Fig. 6). CHELSA72 data are downscaled to a 1 km grid, but it does not extend to 2019. Our three-component climate scaling used regression to project CHELSA forward using Terraclimate, followed by downscaling to 1 km with CHELSA, with further downscaling to local climate data. Even where local climate data exist, they often do not span the full duration of field studies, making the link to gridded climate data important. Local climate data were especially important for mountainous sites in the Appalachians, Rockies, Sierra Nevada, and Cascades.
Of the full list of variables, a subset was retained, depending on species (some have narrow geographic ranges) and deviance information criteria of the fitted model (Supplementary Data 2). Year effects in the model allow for the interannual variation that is not absorbed by anomalies20.
Model fitting and TA
As mentioned above, model fitting applied the hierarchical Bayes model of ref. 20 to the combination of time series and opportunistic observations summarized in Fig. 1. Posterior simulation was completed with Markov chain Monte Carlo based on direct sampling, Metropolis, and Hamiltonian Markov chain. Model fitting used 211,146 trees and 2,566,594 tree-years from 123 species (Supplementary Data 2). Only species with multiple observation years were included.
The climate variable referenced as C in Eq. (1) of the main text is, in fact, a vector of climate variables described in the previous section, spring minimum T, summer mean T, and moisture D (Table 1). The anomalies and year effects in the fitted model contribute to the trends not explained by biogeographic variation as γ in Eq. (1). For main effects in the model, the partial derivatives are fitted coefficients, an example being the response to spring minimum temperature (partial f/partial {T}_{mathrm{sp}}={beta }_{{T}_{mathrm{sp}}}). For predictors involved in interactions, the partial derivatives are combinations of fitted coefficients and variables. For example, the response to moisture D, which interacts with tree size, is (partial [F], f/partial {D}={beta }_{{D}} + beta_{GD}G). The response to diameter G, which is quadratic and interacts with D, is (partial f/partial G={beta }_{G}+2{beta }_{{G}^{2}}G ,+{beta }_{GD}D).
Trend decomposition applied the fitted model to every tree in forest inventories from the USDA FIA program, the Canada’s National Forest Inventory, the NEON, and our MASTIF collaboration. Each tree in these inventories has a species and diameter. For trees that lack a canopy class, regression was used to predict it from distances and tree diameters based on inventories that include both location and canopy class, including NEON, FIA, and the MASTIF network. Although inventories differ in the minimum diameter they record, few trees are reproductive at diameters below the lower diameter limits in these surveys, so the effect on fecundity estimates is negligible. For the indirect effects of climate coming through tree growth rates, the same covariates were fitted to growth as previously defined for fecundity, using the change in diameter observed over multiple inventories. A Tobit model was used to accommodate the fact that a second measurement can be smaller than an earlier measurement. The Tobit thus treats negative growth as censored at zero. TA to inventory plots used 7,717,677 trees. Because not all species in the inventory data are included in the MASTIF network, mean fecundity parameters for the genus were used for unfitted species. Species fitted in the MASTIF network accounted for >90% of trees in inventory plots (Supplementary Data 2).
From the predictive distributions for every tree in the inventory data, we evaluated predictive mean trends aggregated to species and plot in Fig. 2b. We extracted specific terms that comprise the components in Fig. 4 and aggregated them too to the plot averages.
General form for TA
Equation 1 simplifies the model to highlight direct and indirect effects. Again, climate variables and tree size represent only a subset of the predictors in the model that are collected in a design vector ({{bf{x}}}_{t}=[{x}_{1,t},ldots ,{x}_{Q,t}]^{prime}), where the q = 1, …, Q predictors include shading from local competition, individual size, and climate and habitat variables (Table 1). On the proportionate scale, Eq. (1) can be written in terms of all predictors, including main effects and interactions, as
$$frac{{mathrm{d}}f}{{mathrm{d}}t}=mathop{sum }limits_{q=1}^{Q}left(frac{partial f}{partial {x}_{q}}+sum _{q^{prime} in {I}_{q}}frac{partial f}{partial ({x}_{q}{x}_{q^{prime} })}{x}_{q^{prime} }right)frac{{mathrm{d}}{x}_{q}}{{mathrm{d}}t}+gamma$$
(2)
where Iq are variables that interact with xq. In this application, interactions include tree diameter with moisture deficit and diameter squared. Each term in the summation consists of a main effect of xq and interactions that are multiplied by the rate of change in variable xq. For the simple case of only two predictors, Eq. (2) is recognizable as Eq. (1) of the main text, where x1, x2 have been substituted for variables G and C. In our application, predictors include additional climate and shading (Table 1).
Recognizing that environmental variables affect not only fecundity but also growth rate, we extract the size effect, that is, the xq that is G, and incorporate these indirect effects (through growth) by expanding g = dG/dt in Eq. (1) of the main text as
$$g=mathop{sum }limits_{q=1}^{Q}left(frac{partial g}{partial {x}_{q}}+mathop{sum}limits _{q^{prime} in {I}_{q}}frac{partial g}{partial ({x}_{q}{x}_{q^{prime} })}{x}_{q^{prime} }right){x}_{q}+nu$$
(3)
where ν is the component of growth that is not accommodated by other terms. This expression allows us to evaluate the full effect of climate variables, including those coming indirectly through growth.
Connecting fitted coefficients in MASTIF to TA
This section connects the continuous, deterministic Eq. (1) to the MASTIF model20 with the interpretation of responses, direct effects, and full effects of Fig. 5. To summarize key elements of the fitted model20, consider a tree i at site j that grows to reproductive maturity and then produces seed depending on its size, local competitive environment, and climate. We wish to estimate the effects of its changing environment and condition on fecundity using a model that includes spatial variation in predictors that are tracked longitudinally over years t. Fecundity changes through maturation probability ρij(t), which increases as trees increase in size, and through conditional fecundity ψij(t), the annual seed production of a mature tree. Let zij(t) = 1 be the event that a randomly selected tree i is mature in year t. Then, ρij(t) is the corresponding probability that the tree is mature, E[zij(t)] = ρij(t)(ρ is not to be confused with the probability that a tree that is now immature will make the transition to the mature state in an interval dt = 1. That is a different quantity detailed in the Supplement to ref. 41). Fecundity has expected value Fij(t) = ρij(t)ψij(t). On a proportionate (log) scale,
$${f}_{ij}(t)={mathrm{log}},{F}_{ij}(t)={mathrm{log}},{rho }_{ij}(t)+{mathrm{log}},{psi }_{it}(t)$$
(4)
The corresponding rate equation is
$$frac{{mathrm{d}}f}{{mathrm{d}}t}=frac{{mathrm{d}},{mathrm{log}},rho }{{mathrm{d}}t}+frac{{mathrm{d}},{mathrm{log}},psi }{{mathrm{d}}t}$$
(5)
The discretized and stochasticized version of Eq. (1) is
$$frac{{mathrm{d}}{F}_{ij}}{{mathrm{d}}t} = , frac{{F}_{ij,t+{mathrm{d}}t}-{F}_{ij,t}}{{mathrm{d}}t}+{epsilon }_{ij,t}\ = , {{Delta }}{F}_{ij,t}+{epsilon }_{ij,t}$$
(6)
where dt = 1 and ϵij,t is the integration error. When applied to a dynamic process model, this term further absorbs process error (see above), which is critical here to allow for conditional independence where observations are serially dependent. In simplest terms, ϵ is model miss-specification that allows for dependence in data.
The MASTIF model that provides estimates for TA is detailed in ref. 20. Elements of central interest for TA are
$${F}_{ij,t} = , {z}_{ij,t}{psi }_{ij,t}\ left[{z}_{ij,t}=1right] sim , {{Bernoulli}}left({rho }_{ij,t}right)\ {rho }_{ij,t} = , {{Phi }}({{boldsymbol{mu }}}_{ij,t})\ mathrm{log},{psi }_{ij,t} = ,{{bf{x}}}_{ij,t}^{prime}{boldsymbol{beta }}+{h}_{t}left(Tright)+{epsilon }_{ij,t}$$
where μij,t = α0 + αGGij,t describes the increase in maturation probability with size, Φ(⋅) is the standard normal distribution function (a probit), ϵij,t ~ N(0, σ2), and ht(T) can include year effects, h(T) = κt, or lagged effects, (h(T)=mathop{sum }nolimits_{k = 1}^{p}{kappa }_{k}{psi }_{ij,t-k}), that contribute to γ in Eq. (1) of the main text. If year effects are used, then γ includes the trend in year effects. (The generative version of this model writes individual states at t conditional on t − 1 and is given in the Supplement to ref. 20.). If an AR(p) model is used, then γ = κ1 (provided data are not detrended). Random individual effects in the fitted model are marginalized for prediction of trees that were not fitted, meaning that σ2 is the sum of model residual and random-effects variance. Again, the length-Q design vector xij,t includes individual attributes (e.g., diameter Gij,t), local competitive environment, and climate (Table 1). There is a corresponding coefficient vector β.
Moving to a difference equation (rate of change) for conditional log fecundity,
$${{Delta }}{f}_{ij,t}={{Delta }}mathrm{log},{rho }_{ij,t}+{{Delta }}mathrm{log},{psi }_{ij,t}$$
where
$${{Delta }}mathrm{log},{psi }_{ij,t} =mathrm{log},{psi }_{ij,t+1}-mathrm{log},{psi }_{ij,t}\ ={{Delta }}{{bf{x}}}_{ij,t}^{prime}{boldsymbol{beta }}+{gamma }_{ij,t}+{nu }_{ij,t}\ {{Delta }}{{bf{x}}}_{ij,t} ={{bf{x}}}_{i,t}-{{bf{x}}}_{ij,t-1}\ {nu }_{ij,t} sim N(0,2{sigma }^{2})$$
The variance in the last line is the variance of the difference Δϵij,t.
Elements of basic theory in Eq. (1) of the main text are linked to data through the modeling framework as
$${{Delta }}{f}_{ij,t}= +{beta }_{{T}_{sp}}{{Delta }}{T}_{sp,j}\ +left({beta }_{T}+2{beta }_{{T}^{2}}{T}_{j}right){{Delta }}{T}_{j}\ +left({beta }_{D}+{beta }_{GD}{G}_{ij,t}right){{Delta }}{D}_{j}\ +left({alpha }_{G}frac{phi ({{boldsymbol{mu }}}_{ij,t})}{{{Phi }}({{boldsymbol{mu }}}_{ij,t})}+{beta }_{G}+2{beta }_{{G}^{2}}{G}_{ij,t}+{beta }_{GD}{D}_{j}right){{Delta }}{G}_{ij}\ +{gamma }_{ij,t}+{nu }_{ij,t}$$
(7)
where ϕ(⋅) is the standard normal density function that comes from the rate of progress toward maturation. Again, the anomalies do not appear in this expression for trends because trends in the anomalies and year effects enter through γ.
The first four lines in Eq. (7) are, respectively, the effects of trends in spring minimum temperatures ΔTsp,j, summer mean temperature ΔTj, moisture deficit ΔDj, and size ΔGij, where the latter comes from growth on inventory plots. The contribution of maturation to change in fecundity is the first term in the fourth line, αGϕ(μij,t)/Φ(μij,t). A map of this term in Fig. 7b shows the strong contribution to fecundity in the East due to the young (Fig. 7a) and/or small (Fig. 4b) trees there. The sum of these terms dominates the patterns in Fig. 3c.
Fig. 7: Size and maturation effects on fecundity.
a Stand age variable in FIA data and b positive effect of maturation for increasing fecundity in the eastern continent. In the West, maturation does not contribute to rising fecundity because large trees are predominantly [mature] larger.
Full size image
All terms in Eq. (7) have units of mean change in proportionate fecundity, and these are mapped in figures of the main text. We focus on proportionate fecundity because it reflects the full effect of climate as opposed to total fecundity, which would often be dominated by one or a few trees of a single species. However, from proportionate fecundity we can obtain change in fecundity as ΔFij,t = Fij,t × Δfij. Stand-level effects on fecundity change at site j can be obtained from individual change as
$${{Delta }}{F}_{j}=mathop{sum }limits_{i=1}^{{n}_{j}}{{Delta }}{F}_{ij}=mathop{sum }limits_{i=1}^{{n}_{j}}{F}_{ij}{{Delta }}{f}_{ij,t}$$
Again, maps in Fig. 5 show mean proportionate effects for all trees on an inventory plot.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article. More