Study area
The study area is a one-hectare (ha) subtropical mixed young forest in the age of 20 to 30 years of Hong Kong. It is located at Shek Kong (22.428774, 114.114968), in between the Kadoorie Farm and Botanic Garden (KFBG) and Kadoorie Institute, The University of Hong Kong (HKU) as shown in Fig. 1. This is a forest dynamic plot under ‘The Center for Tropical Forest Science–Forest Global Earth Observatory (CTFS-ForestGEO) (https://forestgeo.si.edu/). It acts as a research base on the forest dynamics, forest biodiversity, carbon sequestration and more, which also provide opportunity for public involvement in scientific research.
The study area: 1 hectare subtropical moist young forest plot, a full demonstration plot of the Forest Global Earth observatory (ForestGEO) project (https://forestgeo.si.edu/), located in Hong Kong in understanding the long-term forest dynamic. The map is generated by Authors using ArcMap version 10.5 (https://desktop.arcgis.com/en/arcmap/).
The one-hectare forest plot was demarcated by 25 quadrats in 20 m × 20 m each, which were further sub-divided into sixteen 5 m × 5 m sub-quadrats; delineated with permanent marker poles by the professional surveying team. The forest survey was launched on January 11th, 2012 and completed on September 6th, 2012. A total of 63 species, 10,442 individual trees with 20,888 stems were recorded in the site. The stem locations (UTM WGS84 coordinate system) were recorded in nearest 5 cm, and DBH were measured at 1.3 m at breast height in nearest 1 mm. The dominant species are Litsea rotundifolia, Psychotria asiatica, Ilex asprella and Aporosa dioica, which all are native species and accounted for over 80% of stems with the study area. The forest condition of the study area is shown in Table 2.
Methods
A three-stage methodological framework was outlined for this study as shown in Figs. 2 and 3. In the first stage, parameters including DBH, wood density and stem location were recorded for field-measured AGB computation. DBH was directly obtained from site, wood density (in g cm−3) was obtained from the global wood density database43,44 or World Agroforestry database45 (http://db.worldagroforestry.org//wd), up to species level. If the species was not recorded in the database, the value was replaced by its genus averaged wood density value. Tree height was derived from the LiDAR data with reference to the recorded x,y location of the stems. The second stage is LiDAR AGB model derivation. Five allometric models (Model 1 to Model 5) were used to estimate the AGB of individual trees based on field-measured parameters. The allometric model with the lowest model error was selected to compute the ‘Field-measured AGB’, which would be used as the dependent variable to develop the LiDAR-derived AGB model. The LiDAR plots metrics were generated in various plot-size (i.e., 10 m radius, 5 m radius and 2.5 m radius) within the study area. Stepwise linear regression was used to select the important and significant predictors in each regression model. Three different model forms were tested (Model I, II, III as discussed in “Stage 2: LiDAR AGB model derivation” section), The LiDAR derived AGB regression models would be evaluated by means of assumption tests and bootstrapping; as well as cross-validation (CV) in the last stage of model evaluation and validation.
The workflow of the study. Stage 1: allometric modeling to compute ground-truth AGB. Five allometric models were compared and the best one is selected to calibrate model in the next stage. Stage 2: LiDAR AGB Model Derivation, by comparing LiDAR plot metrics derived in different plot size (i.e. 10 m, 5 m, 2.5 m radius). Stage 3: Model Evaluation & Validation on the best model selected from Stage 2 (Source: Authors).
The graphical abstract of the study (Source: Authors).
Stage 1: allometric modeling
20,888 stems from 63 species, were planned to be used to develop the allometric models. Amongst, the 81% of the wood density value were measured up to species level (i.e., 51 species) and 19% were up to genus level (i.e., 12 species). However, as the tree height parameter was retrieved from the LiDAR 1 m Canopy Height Model (CHM), 263 stems found no value and thus the sample size reduced to 20,625 stems. The DBH of stems ranged from 1.0 cm to 57.1 cm, with mean of 2.55 cm and S.D. of 2.38 cm.
The selected five allometric models included two pantropical models by Chave et al.16; two subtropical models by Xu et al.46 and one local model by Nichol and Sarker47 were assessed. The inclusion of the subtropical models from Xu et al.46 is because of the similar forest type (i.e., subtropical moist forest from southern China) with our study area; whereas the pantropical models from Chave et al.16 is due to its large database across the pantropical regions yet not being explored adequately its applicability in subtropical forests.
The five allometric models were compared by One-way ANOVA and the Tukey HSD post-hoc analysis48. Model 1 (Eq. (3)) was the best model proposed by Chave et al.16, comprised of wood density (ρ), diameter-at-breast height (DBH) and height (H) as shown below (AIC = 3130):
$$AGB=0.0673times {left(rho {(DBH)}^{2}Hright)}^{0.976}$$
(3)
However, it is very unlikely to have a precise tree height data in a closed canopy49. Therefore, an alternative allometric equation without height was then proposed by Chave et al.16 and adopted as Model 2 (Eq. (4)) in our study (AIC = − 4293):to represent the subtropical AGB
$$AGB=expleft[-1.803-0.976E+0.976text{ln}left(rho right)+2.673text{ln}left(DBHright)-0.0299left[{(text{ln}(DBH))}^{2}right]right]$$
(4)
The bioclimatic variable, “E”, which made up of three parameters to account for climatic variations: Temperature seasonality (TS), Long-term Maximum Climatological Water Deficit (CWD) and Precipitation Seasonality (PS). The formula of the bioclimatic variable, “E”, is shown below (Eq. (5)):
$$E={left(0.178 times TS-0.938 times CWD-6.61 times PSright)}^{{10}^{-3}}$$
(5)
The monthly mean temperature (℃), precipitation (mm) and evapotranspiration (mm) data were obtained from the Hong Kong Observatory50 in the period of 1981 to 2010 to compute the three variables. E was then computed as 0.261 and was input into Eq. (4) to derive the ground-truth AGB of individual stems.
The AGB models developed by Xu et al.46 were conducted in the subtropical mixed-species moist forest in southern part of China. Two allometric models from the study were selected to represent the subtropical AGB model. Model 3 (Eq. (6)) assumed tree height is available (by extracting from LiDAR):
$$AGB=text{exp}(-2.334+2.118text{ln}(DBH)+0.5436text{ln}(H)+0.5953text{ln}(rho ))$$
(6)
Meanwhile, an alternative allometric model would be used by assuming tree height was not available, which is the Model 4 (Eq. (7)) of this study:
$$AGB=text{exp}(-1.8226+2.4105text{ln}left(DBHright)+0.5781text{ln}(rho ))$$
(7)
Nichol and Sarker47 developed a local allometric model for Hong Kong by harvesting 75 trees from 15 dominant species of Hong Kong. DBH and H within 50 circular sample plots from a variety of tree stands were measured, which were then used to establish allometric model with field measured AGB. The best allometric model was a model using DBH as the sole parameter and thus selected as the Model 5 (Eq. (8)) of this study:
$$AGB=text{exp}(-1.8226+2.4105text{ln}left(DBHright)+0.5781text{ln}(rho ))$$
(8)
Stage 2: LiDAR AGB model derivation
The airborne LiDAR data used in this study was captured by Optech Gemini ALTM Airborne Laser Terrain Mapper and acquired by The Government of the Hong Kong Special Administrative Region from December 1st, 2010 to January 8th, 2011. A total of 5575 ground truth points were generated, with horizontal accuracy of 0.294 m (95% confidence interval (C.I.)). Vertical accuracy was also assessed against the orthometric heights of Hong Kong Principal Datum (HKPD), the average vertical accuracy was 0.1 m (95% C.I.). Multiple returns were recorded per pulse up to four range measurements (i.e., first, second third and last). The LiDAR acquisition parameters are shown in Table 3.
Prior to generation of the plot metrics, the LiDAR data was pre-processed by creating the ground TIN by extracting only the ground returns. The extraction and processing were performed in ArcMap 10.5 with LAStools extension (https://rapidlasso.com/lastools/) and the FUSION 3.7 (http://forsys.cfr.washington.edu/fusion/fusionlatest.html). The canopy surface was defined using all non-ground returns with height above 2.0 m. The reason of choosing 2.0 m as the threshold was to avoid canopy returns to be mixed with ground returns51. Moreover, ‘all returns’ were used instead of the ‘first returns’, since the former provided more information on the lower canopies or understory51, while over 50% of returned points in our study area were classified as medium or low vegetation. The ground TIN was subtracted from the canopy surface to compute the normalized height value of each canopy point. The LiDAR metrics were then derived from the normalized height point cloud.
To be compatible with the LiDAR dataset and to facilitate the plot delineation procedure, the entire 1 ha study area was divided into circular plots with three plot sizes: (1) twenty-five 10 m radius plots (16,182 stems), (2) one-hundred 5 m radius plot (15,538 stems) and (3) four-hundred 2.5 m radius plots (15,100 stems) as shown in Fig. 4a–d. Circular plots were considered more favorable than rectangular or square plots, since the periphery-to-area ratio was the smallest and thus minimized the number of edge trees52.
The (a) 1 ha rectangular plot clipped into (b) twenty-five 10 m radius circular plots; (c) one-hundred 5 m radius plots; and (d) four-hundred 2.5 m radius plots respectively; for the LiDAR metrics derivation. The maps are generated by Authors using ArcMap version 10.5 (https://desktop.arcgis.com/en/arcmap/).
The 59 plot metrics, under the descriptive, height, intensity and canopy cover categories, were derived by the ‘cloudmetrics’ function in FUSION version 3.7. The LiDAR metrics, and its log-transformed metrics were input into stepwise regression model as independent predictors of AGB. Significant predictors were selected (F < 0.5 were entered and F > 1.0 were removed) into the regression model. Three sets of regression models were generated (Table 4) and the allometric models tended to be linear and normal after logarithmic transformation18,19.
Observing the normal Q–Q plot (Fig. 5a) of the dependent variable (i.e., AGB) and the Kolmogorov–Smirnov statistic (d = 0.216, p < 0.05), it indicated the raw AGB (i.e., Model I) was non-normal. Therefore, AGB was then log-transformed into the log-unit (i.e., Model II); the Kolmogorov–Smirnov statistic (d = 0.135, p > 0.200) indicated a normal distribution (Fig. 5b) and which became the Model II. The further log-transformation into Model III did not indicate significant improvement and thus Model II in various plot sizes were to be further explored.
The normal Q–Q plot, in 10 m radius plot size, of (a) raw AGB and (b) log-AGB. After logarithmic transformation, the AGB (dependent variable) was normalized.
Assumption tests including test on normality, homoscedasticity and absence of multi-collinearity were conducted. The Normal P–P plot, scatter plots on residuals, ‘Tolerance’ index and Variance Inflation Factor (VIF) were applied to detect the violation of assumptions. If the ‘Tolerance’ index is smaller than 0.1, or VIF is greater than 10, it indicates a significance chance of collinearity53.
Stage 3: model evaluation and validation
AGB regression models were to be evaluated in this stage. The Model R2, Adjusted R2, Mean-absolute-deviation (MAE) and Root-mean-squared-error (RMSE) were reported to indicate the explanatory power of the model. R2 showed the amount of explained variance by the model, MAE was the average magnitude of error without considering the direction (Eq. (9)). RMSE (Eq. (10)) was the square root of the averaged squared residual and being sensitive to outliers (i.e., larger error) as the errors are squared. The RMSE would be reported in the unit of kg/ha.
$$MAE =frac{sum_{i=1}^{n}left|(widehat{y}-y)right|}{n}$$
(9)
$$RMSE= sqrt[]{frac{{{sum }_{i=1}^{n}(widehat{y}-y)}^{2}}{n}}$$
(10)
(widehat{text{y}}) is the ith estimate for AGB of each plot, y is the ith observation of that plot, divided by the (n), denoting the sample size.
However, as the regression model was built with limited sample plots, it might subject to model overfitting. Bootstrapping was adopted to assess statistical accuracy in terms of the confidence intervals54. Ultimately, it provides a robust estimation of the standard errors, confidence intervals for estimates, including the model regression coefficient, mean and correlation coefficients. The model uncertainty of this study was assessed by 1000 runs of bootstrapping. The 95% “Bias-corrected and accelerated (BCa) confidence interval (C.I.)” on the beta coefficient of model predictors was reported.
Leave-one-out cross validation (LOOCV) was utilized for model validation. The model would be trained by all data points except the one being left out for validation55. The Predicted Residual Error Sum of Squares, PRESS, was the sum of the ‘squared deleted residuals’ (SSDR) of the n−1 observation. Predicted R2 was computed by dividing PRESS by the total sum of square residual (SSTO) expressed in Eq. (11). The RMSE of the CV model was also reported as Eq. (12). The RMSE of the CV model without overfitting shall be approximate to that of the original model.
$${R}_{pred}^{2}=1-frac{PRESS}{SSTO}$$
(11)
$$C{V}_{RMSE}= sqrt[]{frac{PRESS}{d.f.}},$$
(12)
d.f. stands for degree of freedom (i.e., d.f. = 21). These model calibration and validation work were conducted in IBM SPSS Statistics version 24.
Source: Ecology - nature.com