Global patterns of allometric model parameters prediction

Data collection

Peer-reviewed articles published up to Dec 31, 2021 were searched through the Web of Science (http://webofknowledge.com), Google scholar (http://scholar.google.com), and the China National Knowledge Infrastructure (CNKI, http://www.cnki.net). Here we employed a combination of the following search terms: “(tree biomass OR aboveground biomass OR plant biomass OR plant productivity) and (allometric biomass equation OR allometric model OR productivity model OR biomass equation OR biomass model)”. To avoid potential selection bias and duplicates, we conducted a cross-check between the references of relevant articles, which resulted in the selection of 729 relevant articles from the thousands of the appearing articles initially. Subsequently, eligible articles were selected using the following criteria: (1) Allometric models built for specific species with confirmed locations without disturbances were selected, generalized species, large-scales (e.g., province or nation), as well as recently disturbed tree models were excluded. (2) The method employed to develop the model was destructive harvesting and weighing, with at least twenty sample trees, were selected; articles were excluded that did not include measurements and used less than twenty sample trees. (3) The model forms were W = a*D^b and LnW = a + b*Ln(D) or W = a*(D²H)^b and LnW = a + b*Ln(D²H), where W is the aboveground biomass, and D is the diameter at breast height, H is the tree height, were selected. Consequently, we excluded articles with other variables and other forms of models. Finally, 426 articles remained from the original 729 (Supplementary Fig. S1).

We then distilled data from the articles for the following variables: (1) Allometric models, in the form of W = a*D^b and LnW = a + b*Ln(D), W = a*(D²H)^b and LnW = a + b*Ln(D²H) including the parameters a, b in the D range and H range. (2) Tree species corresponding to the models, including families, genera, and species. (3) Location data, including longitude, latitude, and study sites. (4) Climate data, including mean annual temperature (MAT, °C) and mean annual precipitation (MAP, mm) of the tree species location. (5) Terrain data, including slope and aspect. (6) Soil data, including soil organic carbon (SOC), clay, and soil type.

Since not all articles provided the location, climate, soil, and terrain data of the studies, we estimated the missing data as follows, (1) we supplemented the longitude and latitude with the study location using Google Earth. (2) We extracted the missing climate data by using geographic coordinates from WorldClim version 2.0 (http://worldclim.org/current)¹⁶. (3) We obtained the shuttle radar topographic mission DEM data with 30 m resolution from NASA, and used SAGA-GIS software to derive various terrain data from the DEM such as altitude, slope, and aspect^{17, 18}. (4) The missing soil data was derived from the Regridded Harmonized World Soil Database v1.2¹⁹. In particular, we established the soil type according to Soil Taxonomy to increase the accuracy of the analysis and prediction. Furthermore, if the experiments were performed at multiple sites in one study, they were treated as independent observations. In light of above criteria, 817 allometric models in the form of W = a*D^b or LnW = a + b*Ln(D) and 612 allometric models in the form of W = a*(D²H)^b or LnW = a + b*Ln(D²H) were collected from the 426 articles.

Allometric model

The relationship between the diameter and aboveground biomass was in the form of the power function²⁰:

$$begin{array}{c}Wi=atimes D{i}^{b},end{array}$$

(1)

where Wi is the dry mass of the ith tree (kg), Di is diameter at breast height (cm), and a and b are the parameters of the model.

$$Wi=atimes (D{i}^{2}Hi{)}^{b},$$

(2)

where Wi is the dry mass of the ith tree (kg), Di is diameter at breast height (cm), Hi is the tree height (cm), and a and b are the parameters of the model.

However, a heteroscedasticity exists when directly fitting the tree biomass. The logarithmic transformation of Eq. (1) or Eq. (2), is convenient to facilitate model fitting and deal with heterocedasticity²¹. The logarithmic transformation allometric model:

$$begin{array}{c}Lnleft(Wiright)=a+btimes Lnleft(Diright),end{array}$$

(3)

was used in this function, where a (Eq. 3) represents Ln(a) (Eq. 1), and b (Eq. 3) is the same as b (Eq. 1), respectively.

$$begin{array}{c}Lnleft(Wiright)=a+btimes Lnleft(D{i}^{2}Hright),end{array}$$

(4)

was used in this function, where a (Eq. 4) represents Ln(a) (Eq. 2), and b (Eq. 4) is the same as b (Eq. 2), respectively. To unify the models, we transformed the collected Eqs. (1) to (3) and Eqs. (2) to (4).

Data analysis

To establish the relationship between variables with parameters a and b for making a parameter prediction on a global scale, Random Forest (RF) (an example of a machine learning model) was employed, which consists of an ensemble of randomized classification and regression trees (CART)²¹. In short, the RF will generate a number of trees and aggregate these to provide a single prediction. In regression problems the prediction is the average of the individual tree outputs, whereas in classification the trees vote by majority on the correct classification^{22, 23}. Generated trees called n_tree are based on a bootstrapped 2/3 sample of the original data to decrease correlations by choosing different training sets in the RF modeling process¹⁵. In addition to this normal bagging function, the best split at each node of the tree was searched only among a randomly selected subset (m_try) of predictors²⁴. The tree growing procedure is performed recursively until the size of the node reaches a minimum, k, which is parameterized by the user. For the rest of the original data, RF provides a believable error estimation using the data called Out-Of-Bag (OOB), which is employed to obtain a running unbiased estimate of the classification error as trees are added to the forest¹⁵.

Predictive variable selection

The variables included stand factors such as density, family, and diameters, as well as non-stand factors such as MAT, MAP, and SOC. Considering that the prediction was on a global scale, the first step was to exclude the factors that it was not possible to completely extract. Next, we selected variables through the following²²: (1) the RF classifier was initially applied using all of the predictor variables, and variable importance was used to rank them based on the mean decrease in accuracy. (2) Removing the least important variables by the variable importance ranking, (3) the training data were then partitioned five-fold for cross-validation and the error rates for each of the five cross-validation partitions were aggregated into a mean error rate, and 20 replicates of the five-fold CV were performed²⁵.

By means of the above, eleven variables, including family, genus, species, MAT, MAP, altitude, aspect, SOC, slope, clay, and soil type, were remained to predict parameters. Since the combinations of variables were different, five combinations were performed to make predictions from the eleven variables above. Among the five combinations, each were used by RF to predict and select via the model evaluation index VaR explained and the mean of squared residual (Supplementary Table S1).

Optimization of Random Forest parameters

RF depends primarily on three parameters that are set by users. (1) n_tree, the number of trees in the forest. (2) nodesize, the minimum number of data points in each terminal node. (3) m_try, the number of features tried at each node. To obtain the optimization of RF parameters, we set n_tree = 1000, 2000, 3000 and the selection criterion was that n_tree was small enough to maximize computational efficiency as well as produced stable OOB error²⁵. As for nodesize, we used 3, 5, 7, and 5 as the default for regression RF, given that the m_try value always is always one third of the number of variables. Here we also set the m_try values (ranging from 2 to 4), which were tested, and we accessed the OOB error rates from 50 replicates for each m_try value²⁵. The primary tuning parameter above were optimized, as well as each combination of the three RF parameters through a grided search, which were used to predict and set RF parameters according to the predictive effect of each combination (Supplementary Table S2).

All above data analysis were conducted in R 4.0.3²⁶. And the output is the spatial pattern of allometric model parameters at 0.5° resolution.

Predicted parameter validation

Further to assess the accuracy of the predicted parameters, we applied them to estimate the AGB at six sites. And the actual AGB of the sites had been obtained via destructive sampling from 209 plots, which were located in Hubei, Liaoning, Gansu, Hebei and Heilongjiang provinces, and Inner Mongolia autonomous region from 2009 to 2013²⁷ (Table 1). First, we selected the sample trees according the dominant, average and inferior tree outside the plot. Then the sample trees were felled as carefully as possible and tree height (H), tree diameter in the breast (DBH) and live crown length were recorded. To divide trees into several sub-samples, including branches, leaves, stem wood and stem bark, all of the branches were removed and leaves were picked. Besides, stem was divided into 1 m sections and bark of the stem was removed. Finally, all sub-samples of aboveground part of trees were oven-dried at 80 °C until a constant weight was reached and the sum of all the sun-samples weight was the actual AGB. Through the above process, 249 actual AGB data were obtained. Meanwhile, the predicted parameters of the models together with the DBH and H estimated the predicted AGB. The actual AGB data of 249 sample trees were compared with the predicted AGB by making fitting curves between them in R to show the availability of predicted parameters according root mean square error (RMSE) and R².

Table 1 The basic features of the sampling sites.

Full size table

The experimental research and field studies on plants in this study, including the collection of plant material, complied with the relevant institutional, national, and international guidelines and legislation. And we ensured that we have permission for the plant sampling, all of the steps were allowed in our study for the plant research. In addition, plant identification in this study was conducted by X.Z according to World Plants (https://www.worldplants.de) in the herbarium of School of Forestry & Landscape of Architecture, Anhui Agricultural University, and the voucher specimen of all plant material has been deposited in a publicly available herbarium.

Source: Ecology - nature.com

Global patterns of allometric model parameters prediction

Data collection

Allometric model

Data analysis

Predictive variable selection

Optimization of Random Forest parameters

Predicted parameter validation

Polydimethylsiloxane-coated textiles with minimized microplastic pollution

Quantitative dose-response analysis untangles host bottlenecks to enteric infection

ITALIAN LANGUAGE

ENGLISH LANGUAGE