in

# Bottom-up estimates of reactive nitrogen loss from Chinese wheat production in 2014

### Literature review

We conducted a comprehensive review of relevant literature published since 1995. Studies were extracted from the China National Knowledge Infrastructure and Web of Science using the following keywords: “N (nitrogen) loss OR NO (nitric oxide) emission OR N2O (nitrous oxide) emission OR NH3 (ammonia volatilization) emission OR NO3 (nitric leaching) OR N (nitrogen) runoff AND wheat AND China”. We excluded the following types of experiment: experiments not covering the entire wheat growing season, experiments conducted in greenhouses or laboratories, experiments without zero-N control, and experiments including manure, controlled release fertilizer, or inhibitors. In total, we extracted 941 observations from 138 articles, consisting of 121 observations of NO emission, 383 of N2O emission, 185 of NH3 emission, 188 of NO3 leaching, and 64 of Nr runoff. We also extracted data on N application rates, and climate and soil variables (Fig. 1). Missing climate data were obtained from China Meteorological Data Network (https://data.cma.cn/), miss values of soil organic carbon (SOC) and total N content were obtained from the National Scientific Fertilizer Network (http://kxsf.soilbd.com/), and missing soil silt, clay, sand content, bulk density, cation exchange capacity (CEC), and pH data were obtained from the Harmonized World Soil Database (HWSD) v. 1.2 (http://www.fao.org/soils-portal/soil-survey/soilmaps-and-databases/harmonized-world-soildatabase-v12/en). Based on this dataset, the EFs of Nr loss pathways were calculated by the following equation:

$$E{F}_{i}=left({E}_{treatment}{rm{-}}{E}_{control}right){rm{/}}N;applied$$

(1)

where i = 1–5, represented NO, N2O, NH3, NO3 leaching and Nr runoff, respectively. Etreatment is the loss rate of experimental treatments with applied N fertilizer, Econtrol is the loss rate of experimental control without applied N fertilizer, and N applied is the N application rate corresponding to Etreatment. The resulting data was used to develop RF models to predict EFs of the five Nr loss pathways.

### RF models

RF models outperformed empirical models in previous studies15,18,19. We employed RF models to predict the EFs of NO, N2O, NH3, NO3 leaching, and Nr runoff. Environmental factors were selected via redundancy analysis20. Redundancy analysis, a basic ordination technique for gradients analysis, produces an ordination summarizing the variation in several response variables that can be best explained by a matrix of explanatory variables based on multiple linear regression. We conducted redundancy analysis using Canoco 5 to further analyze the effects of 10 environmental factors, including 4 soil physical factors (bulk density, silt, clay, and sand content), 4 soil chemical factors (pH, SOC, CEC and total N content), and 2 weather factors (total rainfall and mean temperature during the wheat growing period) of different EFs. Ultimately, the dataset of each pathway contained an ensemble of different environmental factors (Table 1).

When establishing the RF model, the first step was to select k features from a total of m (k < m) in the training dataset, to generate root node d and daughter nodes; the second step was to repeat the first step to generate a forest with n decision trees. Lastly, the testing dataset was used to create a final decision tree21. We randomly split the dataset, consisting of paired environmental factors and EFs of each Nr loss pathway, into 10 parts of equal size. Among these parts, 7/10 were used to train RF models for different pathways and 3/10 were used to test the performance of the models. We used “randomForest” R package (https://www.stat.berkeley.edu/~breiman/RandomForests/) to develop RF models in R software (https://cran.r-project.org/). To reduce random error, we ran each model 500 times and determined the performance based on the average value (Fig. 2).

### Grid database

We categorized Chinese wheat production into four agroecological regions based on climate and soil variables: North China, North China Plain, South China, and Southwest China (Fig. S1)22. The grid layer of wheat distribution was derived from ChinaCropArea1 km (https://doi.org/10.17632/jbs44b2hrk.2), which provided a 1-km-grid crop-harvest dataset for wheat across China17. We selected the grid layer from 2014 and integrated nationwide climate and soil data, and N application rates derived via surveys of farmers, into grid layer (Fig. 1). We obtained climate and soil data from the same sources used for missing data. Climate data are in the form of 10-year averages23. The climate and soil data were extracted into each grid and used as input variables for the RF models.

### Predicting EFs and calculating Nr loss

The EF of each pathway was predicted by corresponding developed RF model in each grid (Fig. 3). Nr loss was calculated by multiplying predicted EFs by N applied’ using the following equation:

$${E}_{ij}=N;applie{d}_{j}^{{prime} }ast ;E{F}_{ij}$$

(2)

$$total;Nr;los{s}_{j}={E}_{1j}+{E}_{2j}+{E}_{3j}+{E}_{4j}+{E}_{5j}+{E}_{6j}$$

(3)

where i = 1–5, representing NO, N2O, NH3, NO3 leaching and Nr runoff, respectively. And j = 1, 2, 3, … represented different grids. N applied’ was obtained through a nationwide survey of farmers from 2014. For the survey, 3–10 villages were chosen from each county, and 30–120 random farmers were surveyed. In total, 2.23 million farmers from 1,050 counties were surveyed22. The N application rates were extracted the average rate was determined for each county, superimposed using Kriging interpolation, and plotted on a map of China. Finally, average rates were extracted into grid layer of Chinese wheat production (Fig. 4a). Total Nr loss (Fig. 4b) was summed from five Nr loss pathways as Eq. (3) (Fig. 5).

### Database structure

The Nr-wheat 1.0 database of Nr loss associated with Chinese wheat production consists of three files (Fig. 1). The ‘data file’ provides N application rates, EFs and Nr loss of five loss pathways (NO, N2O, NH3, NO3, and Nr runoff). The ‘source file’ contains studies from which data were extracted to develop RF models, the code of RF model, and subregions of Chinese wheat production. The ‘readme file’ explains the abbreviations used in the ‘data file’ and ‘source file’, and provides the units of all variables included variables (Fig. 1).

Source: Ecology - nature.com