Reference carbon cycle dataset for typical Chinese forests via colocated observations and data assimilation

We selected 10 permanent plots with long-term observations from CERN to include typical forest types of various ages in the East China monsoon forest region, including tropical rainforests, subtropical evergreen coniferous and broad-leaved mixed forests, warm temperate deciduous broad-leaved forests and temperate coniferous and broad-leaved forests, with evident precipitation and temperature gradients from south to north (Fig. 1). The spatial representativeness of the selected 10 sites across the Chinese forest region was evaluated by calculating the Euclidean distance based on various environmental factors. The 10 sites performed well and represented more than 80% of the Chinese forest region (Fig. S1). Of these forests, the Xishuangbanna tropical seasonal rainforest (BNF), Dinghu Mountain subtropical evergreen coniferous and broad-leaved mixed forest (DHF), Ailao Mountain subtropical evergreen broad-leaved forest (ALF), and Changbai Mountain temperate deciduous coniferous and broad-leaved mixed forest (CBF) are mature natural forests; the Shennongjia subtropical evergreen deciduous broad-leaved mixed forest (SNF) and Huitong subtropical evergreen broad-leaved forest (HTF) are natural secondary forests; and the other sites, i.e., the Beijing warm temperate deciduous broad-leaved mixed forest (BJF), Maoxian warm temperate deciduous coniferous mixed forest (MXF), Qianyanzhou subtropical evergreen artificial coniferous mixed forest (QYF), and Heshan subtropical evergreen broad-leaved forest (HSF), are plantations or middle- and young-age forests. All 10 sites are well protected and subject to minimal human activities, thus reflecting the C cycle dynamics under global environmental change, e.g., climate change, increasing CO₂ and nitrogen deposition. The detailed characteristics of each plot can be found in their profiles in the CFCCD database.

There are three main steps to create the observation-based basic dataset and assimilated dataset of typical Chinese forests C cycle dynamics:

1.
Observation-based basic data acquisition. An ensemble of daily atmospheric and water data at ten CERN sites were used as forcing datasets for MDF and future scientific analysis; biological and soil data were also collected from CERN and processed by quality control and statistical calculation as benchmark to constrain the model.
2.
Implementation of a multiple data-model fusion framework. The Markov Chain Monte Carlo (MCMC) that integrated the Data Assimilation Linked Ecosystem Carbon (DALEC) model with multiple and dynamic observational data was used to retrieve C-cycle process parameters in a realistic disequilibrium state.
3.
Key process parameters and C function data assimilation. The key parameters of the process-based C cycle model (DALEC) were determined via the model-data fusion method; then the ecosystem C sequestration datasets were simulated by forward running the DALEC model with optimized parameters and then validated based on observational data and other previous studies.

Each step is explained in more detail below.

Observation-based basic data acquisition

Atmospheric and water data

In situ observations of daily air temperature (Ta), photosynthetically active radiation (PAR), relative humidity (RH), precipitation (Precip), and soil moisture (Sw) at the 10 sites from 2005 to 2015 were obtained from the CERN scientific and technological resources service system (http://www.cnern.org.cn/). These atmospheric and water data were mostly observed by an automatic meteorological station at each site. Among them, the PAR was estimated by a LI-COR LI-190SZ Quantum Sensor; Ta and RH were measured by a QMT110 sensor; Sw was estimated by a soil moisture neutron probe or the Time-Domain Reflectometry (TDR) soil moisture probe; the associated saturated soil water capacity (Sc) was measured by the cutting ring method to sample soil in each field campaign and the oven-drying method to measure saturated moisture content after the soil was soaked in water for 48 h; and Precip was artificially observed by CERN staff using an SM1-1 rain gauge. These monitoring data were collected in keeping with CERN’s protocols of observation and quality control^29,30.

There were occasional missing data in time-continuous meteorological observations; therefore, the data were processed by standardized gap filling³¹. Specifically, for Ta, PAR, and RH, which were applied as model driver, we used a linear interpolation method to interpolate continuous missing data with less than three observations; otherwise, we established a regression model using the CERN observations and other observations from adjacent stations of the China Meteorology Administration (756 meteorological stations; http://data.cma.cn/en) to interpolate continuous missing data with more than three observations.

Biological data

Biomass.

At each site, the diameters at breast height (DBHs) and tree heights were measured for each tree in a regular inventory performed at least once every five years. The allometric equations of the DBH and/or tree heights with the biomasses of different plant tissues (i.e., leaves, branches, stems and roots) were developed at each site for various species based on the felled standard trees in the destructive plot. Then, we calculated the biomasses for the ten ecosystems using these allometric equations (FA02 table downloaded from http://www.cnern.org.cn/), which all passed the significance test (0.01 level) and have the R² most above 0.9 when its estimation compare to observations from standard trees. For some unfelled species under protection, the allometric equations were obtained from Luo et al.³², which were developed based on national inventories and meta analyses from the published literature.

Litterfall.

The aboveground litterfall biomass was measured monthly by ten replicates with 1 m × 1 m baskets during the growing season or once during the nongrowing season. All collected litter was dried at 70 °C for 24 h in the laboratory and then weighed. To avoid the effects of wind on the measurement of litterfall biomass within an individual month, annual litterfall biomass data were finally adopted for each site.

LAI.

The leaf area index (LAI) at each site was measured optically with an LAI-2000 plant canopy analyzer (LI-COR, Lincoln, NE, USA) at least quarterly every year.

Soil data

Soil organic matter (SOM) was measured by the potassium dichromate oxidation titrimetric method. Soil bulk density (SBD) was measured by the cutting ring method in each field campaign at 10 forest sites. Soil particle size (i.e., soil mechanical composition) was measure by the laser particle analyzer. At least three samples were collected from each of the five soil layers (0–10, 10–20, 20–40, 40–60, and 60–100 cm) once every five years.

SOC.

The soil organic C (SOC) content was calculated from SOM, SBD, and volume percentage of gravel with particle size >2 mm at 10 forest sites as follows³³:

$$SOC={sum }_{i=1}^{n}0.58times {H}_{i}times {B}_{i}times {O}_{i}times (1-{rm{theta }})times 100$$

(1)

where SOC is the soil organic C density (g C/m²) of all n layers, H_i is the soil thickness (cm), B_i is the soil bulk density (g/cm³), O_i is the SOM content of the i_th layer (%), and θ is the volume percentage (%) of gravel with particle size >2 mm. In the absence of soil bulk density or soil organic matter content measurements in some layers, the missing soil measurements corresponding to specific soil depths of theses forest ecosystems were supplemented according to the empirical formulas of the relationships between SOM/soil bulk density and soil depth in different layers, which were developed based on the long-term and across-site CERN soil observations³⁴.

All these raw atmospheric, biological, and soil data mentioned above can be directly download from CERN scientific and technological resources service system (http://www.cnern.org.cn/data/initDRsearch) or obtained after online application via protocol sharing.

Auxiliary flux data

Net ecosystem exchange (NEE).

These data were obtained from ChinaFLUX (http://www.chinaflux.org/), covering CBF, QYF, and BNF. The data were aggregated to the daily time step from half-hourly CO₂ flux data measured by the eddy covariance technique and processed with quality control and gap filling procedures³⁵.

Implementation of MDF method

The assimilated data were retrieved from a multiple data-model fusion method (Fig. 2). Specifically, the long-term and dynamic observations of biomass, litterfall, LAI and SOC were used as the model constraint data; Ta, PAR, and RH were used as the meteorological driving data; and the metropolis simulated annealing algorithm, a variation in the MCMC technique^36,37, was applied to retrieve the C cycle parameters (e.g., C allocation and C turnover times) against the observations and prior knowledge. Then, we forward-simulated the model to produce the dynamic and time-continuous changes in ecosystem C sequestration function.

Fig. 2

Flowchart of the generation of assimilated datasets in a multiple- and long-term data assimilation framework.

Full size image

Since the dynamic C cycle observations provided an effective solution to constrain the C cycle states without the steady state assumption (SSA), the novelty of our MDF framework involves estimating these C cycle dynamics in better agreement with the actual dynamic disequilibrium state³⁸. Therefore, the uncertainty in allocation and turnover parameters and in C pool states have largely been reduced based on the time-series observations under the non-SSA (NSSA)^21,39,40, thereby significantly enhancing the model’s ability to predict the C sequestration function^19,41,42.

Carbon cycle process model description

DALEC is a box model of C pools connected via fluxes running at a daily time step and has been applied extensively to the MDF research^21,43. Its main structure (i.e., C cycle, C allocation, and turnover process) is generally consistent with state-of-the-art process-based models (Fig. S2; Table S1), with five pools (i.e., foliage (Cf), fine root (Cr), woody (Cw, including branches, stems, and coarse roots), litter (Clit) and SOM (Csom)) for evergreen forests and an additional labile pool (Clab) of stored C that supports leaf flushing for deciduous forests. The C cycle was initiated with the canopy C influx: gross primary productivity (GPP), which was predicted by the Aggregated Canopy Model (ACM)⁴⁴ (Appendix S1). After GPP is consumed by autotrophic respiration (Ra), the remaining photosynthate (NPP) is allocated to plant tissue pools (Cf, Cr, or Cw). The C exiting from all C reservoirs was based on a first order differential equation with various turnover rates, with temperature and moisture dependency on the turnover from the litter and soil pools. In contrast to the original DALEC model only with temperature scalar f_Ta, here we added a new function f_Sw to express soil moisture pressure on litter and soil decomposition processes (Appendix S1). In general, the C pools and fluxes in DALEC were iteratively calculated at a daily time step and determined as a function of the key turnover and allocation parameters. A detailed model description can be found in Williams et al.⁴⁵ and Fox et al.⁴⁶.

Multiple data-model fusion at the nonsteady state

In a realistic disequilibrium state, C pools are time-variant, i.e., the C efflux is not equal to the C influx (left(frac{dC}{dt}ne 0right)); thus, the MDF was run via the dynamic and long-term CERN observations to constrain the DALEC model at the non-steady state (Eq. 2). Here, to avoid the uncertainty arising from the spin-up process under SSA, we determined the initial state of the C pools by the initial observations of C stocks or by optimization (i.e., Clab, which cannot be directly observed). Then, the turnover and allocation parameters were retrieved under the disequilibrium state with dynamic environmental forcing. This method avoids the considerable uncertainties when invoking the SSA to estimate the initial state of C pools and the C cycle parameters(e.g., allocation coefficients and turnover rates)^39,40,47, which could lead to obvious biases in C sequestration¹⁹.

$$left{begin{array}{l}Delta {C}_{i}ne 0 {C}_{i}left({rm{t}}+1right)={C}_{i}left({rm{t}}right)+{I}_{i}left({rm{t}}right)-{k}_{i}{C}_{i}left({rm{t}}right),{rm{i}}=1,2ldots n {C}_{i}left({rm{t}}=0right)={C}_{i}0end{array}right.$$

(2)

where C_i, I_i, and k_i represent the size, input and turnover rate of the i_th C reservoir, respectively; C_i0 represents the initial state of the i_th C reservoir; t represents the specific model-running time step (daily step); and ΔC_i represents the i_th C pool change between t day and t +1 day when applicable into actual calculation. According to the Bayesian theory, the posterior distributions of the parameters are calculated by maximizing the likelihood function (Eq. 3).

$$L={prod }_{j=1}^{m}{prod }_{i=1}^{{n}_{j}}frac{1}{sqrt{2pi }{sigma }_{j}}{e}^{-{left({x}_{j,i}-{mu }_{j,i}left({boldsymbol{P}}right)right)}^{2}/2{sigma }_{j}^{2}}$$

(3)

where L is the integrated likelihood function; m is the number of multiple data types; n is the number of data points categorized by the j_th data type; x_j,i is the measured value composed of dynamic C cycle observations; μ_j,i(P) represents the modeled fluxes and stocks based on parameters under the NSSA (P); and σ_j is the standard deviation of each data point classified by the j_th data type. Moreover, we imposed a sequence of ecological and dynamic constraints on the model parameter inter-relationships and pool dynamics (Appendix S2), which can significantly reduce uncertainty in model parameters and simulations⁴⁸. The more detailed disequilibrium method can be found in our latest study¹⁹.

Key C-cycle process parameters and C sequestration data assimilation

Key process parameter estimation

Here, we mainly focus on how the C input (i.e., the net primary productivity) partitioned into various plant pools (i.e., foliar, wood, and fine roots), i.e., allocation coefficients, which could be directly determined from the optimized parameters (Fig. S3) of the DALEC model after the step 2: MDF method. Another key process parameter, C turnover time, needs further simple statistical calculation based on the model simulations with optimized parameters. Turnover time is commonly estimated by the equation “τ = stock/flux”^20,49. Since the C influx is not equal to the C efflux in the realistic dynamic disequilibrium state, the turnover time should be defined as the ratio between the mass of a C pool and its outgoing flux⁵⁰. Note that with few natural and anthropogenic disturbances in these well-protected CERN sites^12,18, the C efflux is approximately equivalent to the Rh from soil and litterfall (mortality) and Ra (growth) from vegetation. Hence, the turnover time for vegetation, soil, and whole ecosystem can be derived as follows:

$${tau }_{veg}=frac{{C}_{live}}{{I}_{live}-Delta {C}_{live}}=frac{{C}_{live}}{litterfall+{R}_{a}}$$

(4)

$${tau }_{soil}=frac{{C}_{dead}}{{I}_{dead}-Delta {C}_{dead}}=frac{{C}_{dead}}{{R}_{h}}$$

(5)

$${tau }_{eco}=frac{{C}_{eco}}{{I}_{eco}-Delta {C}_{eco}}=frac{{C}_{live+}{C}_{dead}}{{R}_{a}+{R}_{h}}$$

(6)

where τ_veɡ, τ_soil, and τ_eco refer to the biomass, soil and whole-ecosystem turnover times, respectively; C_live, C_dead and C_eco refer to the live biomass C pool size (C_f, C_r, and C_w,), dead organic C pool size (C_soil and C_litter), and the whole-ecosystem C pool size, respectively; I_live, I_dead and I_eco refer to the C input into the live biomass C pool, dead organic C pool, and whole ecosystem C pool, respectively; ΔC_live, ΔC_dead and ΔC_eco refer to the changes in the live biomass C pool, dead organic C pool size, and whole-ecosystem C pool size, respectively; and R_a, R_h and litterfall refer to the autotrophic and heterotrophic respiration, and turnover from all live C pools (i.e., foliage, fine root and woody pools),respectively, as calculated from the DALEC output driven with the estimated parameters during 2005–2015. Since the C reservoirs, fluxes, and turnover times are instantaneous values, we used the averages of the fluxes and reservoirs over multiple years to reflect the average turnover time during a specific period (i.e., 2005–2015).

Time-continuous C sequestration estimation

The optimized parameter values under the NSSA along with the initial observations of the corresponding C pool sizes were used in forward modeling driven by dynamic environmental variables from 2005 to 2015 to obtain the time-continuous soil and vegetation C storage⁵¹. The difference between the ecosystem C influx (GPP) and ecosystem respiration (Ra+Rh) is used to examine the ecosystem C sequestration, i.e., net ecosystem productivity (NEP). Similarly, the difference between the ecosystem C influx (GPP) and ecosystem autotrophic respiration (Ra) is used to examine the net primary ecosystem productivity (NPP).

Source: Ecology - nature.com