A dataset of winter wheat aboveground biomass in China during 2007–2015 based on data assimilation

We selected eleven major wheat production provinces of China for the study area, which comprise the largest winter wheat-sowing fraction: Henan, Shandong, Anhui, Jiangsu, Hebei, Hubei, Shanxi, Shaanxi, Sichuan, Xinjiang, and Gansu (Fig. 1). The wheat planting area is about 22 million ha in these provinces, accounting for more than 93% of the total wheat planting area. The total wheat production in these regions contributes more than 96% of the total wheat production in China, with more than 128 million tons in 2019³³.

We developed a methodological framework for high-resolution AGB mapping. It mainly includes three parts: (1) Data acquisition and processing. (2) The WOFOST model parameterization and calibration. (3) Data assimilation (Fig. 1). Each part is explained in more detail below.

Data acquisition and processing

Meteorological data

China Meteorological Forcing Dataset^34,35 is used as weather driving data for the WOFOST model. The dataset is based on the internationally existing Princeton reanalysis data, Global Land Data Assimilation System data, Global Energy and Water Cycle Experiment-Surface Radiation Budget radiation data, and Tropical Rainfall Measuring Mission precipitation data. It is made by fusing the conventional meteorological observation data of the China Meteorological Administration. It includes seven elements: near-surface air temperature, air pressure, near-surface total humidity, wind speed, ground downward shortwave radiation, ground downward longwave radiation, and ground precipitation rate. The meteorological drive elements required for WOFOST are daily radiation, minimum temperature, maximum temperature, water vapor pressure, average wind speed, and precipitation. Details of these variables that participated in the WOFOST model can be referred to in Table S1.

Soil characteristics measurements and phenology observations

Soil and phenology data were collected at 177 agricultural meteorological stations (AMS) from 2007 to 2015 (Fig. 1). Soil characteristics include soil moisture content at wilting points, field capacity, and saturation. To be consistent with the corresponding units in the crop model, the original data in weight water content was converted into volume water content through the corresponding soil bulk density measurements. Winter wheat phenology observations include the date of emergence (more than 50% of the wheat seedlings in the field show the first green leaves and reached about 2 cm), anthesis (the inner and outer glumes of the middle and upper florets of more than 50% of the wheat ears in the whole field are open, and the anthers loose powder), and maturity (more than 80% of the wheat grains turn yellow, the glumes and stems turn yellow, and only the upper first and second nodes are still slightly green). In most cases, the phenological stage “anthesis” is missing. The anthesis date was calculated by adding seven days to the observed heading date (when more than 50% of the wheat in the whole field exposes the tip of the ear from the sheath of the flag leaf).

County-level yield statistics data

The county-level yield data was collected from city statistical yearbooks of the study area from 2007 to 2015. Since most statistical yearbooks do not directly record per-unit yield data, the county-level yield was obtained by dividing the total yield and planting area. It is worth noting that all yields were calculated in units of metric kilograms per cultivated hectares (kg·ha⁻¹).

The winter wheat land cover data

We used a winter wheat land cover product from a 1 km resolution dataset named ChinaCropArea1km³⁶. This data was derived from GLASS leaf area index products and crop phenology from 2000 to 2015. This dataset is the base map of our data production.

The MODIS LAI data

We used the improved 8-days MODIS LAI products (i.e., 1 km) generated by Yuan et al.³² to assimilate the WOFOST model. The products applied the modified temporal-spatial filter and Savitzky-Golay filter to overcome the spatial-temporal discontinuity and inconsistence of raw MODIS LAI products, which makes them more applicable for the realm of land surface and climate modeling. The products can be accessed via the Land-Atmosphere Interaction Research Group website at Sun Yat-sen University (http://globalchange.bnu.edu.cn/research/lai).

The WOFOST model parameterization and calibration

The WOFOST model introduction

The WOFOST model was initially developed as a crop growth simulation model to evaluate the yield potential of various crops in tropical countries³⁷. In this study, we chose the WOFOST model because the model reaches a trade-off of the complexity of the crop model and is suitable for large-scale simulations³. The WOFOST model is a typical crop growth model that explains crop growth based on underlying processes such as photosynthesis and respiration and their response to changing environmental conditions³⁸. The WOFOST model estimates phenology, LAI, aboveground biomass, and storage organ biomass (i.e., grain yield) at a daily time step³⁹ (Fig. 2).

Fig. 2

Schematic overview of the major processes implemented in WOFOST. The Astronomical module calculates day length, some variables relating to solar elevation, and the fraction of diffuse radiation.

Full size image

Zonal parameterization

We first divided the study area covered by AMS into seamless Thiessen polygon zones. Each Thiessen polygon contains only a single AMS. These zones represent the whole areas where any location is closer to its associated AMS point than any other AMS point. Then, we assigned parameters to the entire zone based on the AMS data, including crop calendar (date of emergence) and soil water retention parameters (soil moisture content at wilting point, field capacity, and saturation). Besides, we also optimized two main crop parameters for controlling phenological development stages, namely TSUM1 (accumulated temperature required from emergence to anthesis) and TSUM2 (accumulated temperature required from anthesis to maturity), by minimizing the cost function of the observational and simulated date corresponding to anthesis and maturity.

Parameter calibration within a single zone

We implemented the calibration of parameters within every single zone, as illustrated in Fig. 3. We calculated the average statistical yield of each county within every single zone from 2007 to 2015, then ranked the counties in descending order and divided them into three groups, namely high, medium, and low-level yield counties, by the 33% quantile and 67% quantile of the average statistical yield. The three counties corresponding to 17% quantile, 50% quantile, and 83% quantile would be used for subsequent calibration and represent the corresponding three yield level groups. We used the statistical yields (converted to dry matter mass based on the standard moisture content of 12.5%) of the three counties for multiple years and a harvest index for each province to convert the county-level yield to AGB for calibration. The harvest index of each province was mainly estimated from the AMS’s dynamic growth records on the biomass composition of the dominant winter wheat varieties of the province and a published literature⁴⁰. Besides, we collected the maximum LAI observations on all agrometeorological stations in all years in the study area, according to its histogram. We found that the histogram follows a normal distribution with a mean of 6.5 and a standard deviation of 1.5. Finally, we calibrated three sets of parameters corresponding to three yield level groups in each single zone according to the three selected counties.

Fig. 3

Flow chart of parameter calibration within a single zone.

Full size image

We designed a three-step calibration strategy for a specific yield level group. Firstly, as winter wheat varieties did not change significantly according to information recorded by agrometeorological stations from 2007 to 2015, we assumed the crop parameters of winter wheat remain unchanged every three years to combine three years of observational data to calibrate the parameters of the WOFOST model better. We maximized a log-likelihood function based on the maximum LAI statistics and every three-year county-level yield and AGB data mentioned to optimize selected crop parameters (see Table S2 in the Supplement Materials).

The log-likelihood function was constructed as follows:

$$log;{{rm{L}}}_{{rm{LAI}}}=-frac{1}{2}left[dlogleft(2pi right)+logleft(left|{Sigma }_{{rm{LAI}}}right|right)+{rm{MD}}{left({{bf{x}}}_{{rm{LAI}}};{mu }_{{rm{LAI}}},{Sigma }_{{rm{LAI}}}right)}^{2}right]$$

(1)

$$log;{{rm{L}}}_{{rm{TWSO}}}=-frac{1}{2}left[dlog(2pi )+logleft(left|{{boldsymbol{Sigma }}}_{{rm{TWSO}}}right|right)+{rm{MD}}{left({{bf{x}}}_{{rm{TWSO}}};{{boldsymbol{mu }}}_{{rm{TWSO}}},{{boldsymbol{Sigma }}}_{{rm{TWSO}}}right)}^{2}right]$$

(2)

$$log;{{rm{L}}}_{{rm{AGB}}}=-frac{1}{2}left[dlog(2pi )+logleft(left|{{boldsymbol{Sigma }}}_{{rm{AGB}}}right|right)+{rm{MD}}{left({{bf{x}}}_{{rm{AGB}}};{{boldsymbol{mu }}}_{{rm{AGB}}},{{boldsymbol{Sigma }}}_{{rm{AGB}}}right)}^{2}right]$$

(3)

$$log;{rm{L}}=log;{L}_{{rm{LAI}}}+log;{L}_{{rm{TWSO}}}+log;{L}_{{rm{AGB}}}$$

(4)

Where log L is the natural logarithm of the likelihood function, d is the dimension, that is, the number of years of joint calibration, which is set to 3 in this study x_LAI is the vector composed of the maximum value of the 3-year LAI simulated by WOFOST, μ_LAI and Σ_LAI are the mean vector and error covariance matrix of maximum LAI based on observation statistics. The annual maximum LAI was assumed to be independent, and the mean and standard deviation for each year was set the same as the result of Fig. 3. Similarly, x_TWSO and x_AGB are the yield vector and AGB vector at maturity of 3 years simulated by WOFOST, and μ_TWSO, μ_AGB are their corresponding county-level statistic vector, Σ_TWSO and Σ_AGB are their corresponding error covariance matrix. In this study, we assumed that the annual yield or AGB was independent, and their corresponding standard deviation was 10% of their statistical value. |Σ| is the determinant of Σ. The expression ({rm{MD}}{({bf{x}};{boldsymbol{mu }},{boldsymbol{Sigma }})}^{2}={({bf{x}}-{boldsymbol{mu }})}^{{rm{T}}}{{boldsymbol{Sigma }}}^{-1}({bf{x}}-{boldsymbol{mu }})), where MD is the Mahalanobis distance between the point x and the mean vector μ.

Secondly, we optimized the inter-annual irrigation. We optimized two parameters every year: the critical value of soil moisture (denoted as SMc) and the amount of irrigation (denoted as V). When the soil moisture simulated by WOFOST is lower than SMc, an irrigation event will be triggered, and the irrigation amount is V cm. In this study, we combined three-year data for calibration with six parameters for optimization. The optimization strategy is the same as the previous step by maximizing the log-likelihood function. Finally, we fixed the optimized irrigation parameters and repeated the first step to calibrate the selected crop parameters and obtain the final optimal parameters.

Data assimilation

Considering that MODIS LAI is relatively low compared to the actual LAI of winter wheat⁴¹, we select a weak-constraint cost function based on the least square of normalized observational and simulated LAI as shown in Eq. (5), which is assimilating the trend information of MODIS LAI into the crop growth model.

$$J={sum }_{{rm{t}}=1}^{{rm{n}}}{left(frac{{{rm{LAI}}}_{{rm{MODIS}}}^{{rm{t}}}-{{rm{LAI}}}_{{rm{MODIS}}}^{min}}{{{rm{LAI}}}_{{rm{MODIS}}}^{max}-{{rm{LAI}}}_{{rm{MODIS}}}^{min}}-frac{{{rm{LAI}}}_{{rm{WOFOS}}}^{{rm{t}}}-{{rm{LAI}}}_{{rm{WOFOS}}}^{min}}{{{rm{LAI}}}_{{rm{WOFOS}}}^{max}-{{rm{LAI}}}_{{rm{WOFOS}}}^{min}}right)}^{2}$$

(5)

Where ({{rm{LAI}}}_{{rm{MODIS}}}^{{rm{t}}}) and .. are MODIS LAI and WOFOST simulated LAI of time t. ({{rm{LAI}}}_{{rm{MODIS}}}^{max}) and ({{rm{LAI}}}_{{rm{WOFOS}}}^{max}) are maximum of MODIS LAI and WOFOST simulated LAI. ({{rm{LAI}}}_{{rm{MODIS}}}^{min}) and ({{rm{LAI}}}_{{rm{WOFOS}}}^{min}) are minimum of MODIS LAI and WOFOST simulated LAI. J is the value of the cost function.

We reinitialize the day of emergence (IDEM), the life span of leaves growing at 35 °C (SPAN), and thermal time from emergence to anthesis (TSUM1) in the WOFOST model on each 1 km winter wheat pixel according to cost function between WOFOST LAI and MODIS LAI. Besides, we applied the Subplex algorithm from the NLOPT library (https://github.com/stevengj/nlopt) for parameter optimization.

Source: Ecology - nature.com