A high-resolution gridded grazing dataset of grassland ecosystem on the Qinghai–Tibet Plateau in 1982–2015

Study area

The Qinghai–Tibet Plateau (26°00′-39°47′N, 73°19′-104°47′E), one of the most important pastoral areas in the world, straddles the southwest regions of China, and it includes 244 counties, which belong to six provinces: Tibet, Qinghai, Xinjiang, Gansu, Sichuan, and Yunnan. It is characterized by rich natural grassland resources, including desert steppes, alpine steppes, and alpine meadows (Fig. 1a). The grassland areas account for over 56% of this region³⁴. The grassland plays a vital role in providing regional and national animal husbandry products and fodder³⁵, which enables the local herders to obtain almost all of the resources required for survival³⁶. The grazing density distribution is extremely unbalanced (Fig. 1a) owing to the high spatial heterogeneity of economic development (Fig. 1b-1) and grassland production (Fig. 1b-2), resulting from the differences in resources and environmental factors³⁷. Over the past few decades, there has been a significant change in the number of livestock animals, and the number of sheep exceeded 160 million by 2020. Therefore, it is urgent to obtain a high-resolution gridded grazing dataset for its evaluating spatiotemporal changes and coordinating the relationship between human beings and the grassland ecosystem.

Fig. 1

Location of the Qinghai–Tibet Plateau: (a) grassland type and distribution, and grazing density (GD) in 244 counties; (b) spatial heterogeneity of economic development (ED) and grassland production (GP) in 244 counties. GD, ED, and GP are represented by sheep unit per grassland area per county (SU/hm²), human footprint index per pixel (HF/pixel) per county, and net primary production per grassland area per county (gC/m²), respectively.

Full size image

Fig. 2

Methodological framework for grazing spatialization.

Full size image

Methodological framework

We developed a methodological framework for high-resolution gridded grazing dataset mapping. The framework mainly includes four parts: (i) identifying features affecting grazing, (ii) extracting theoretical suitable grazing areas, (iii) building grazing spatialization model, and (iv) correcting the grazing spatialization dataset. Each step is explained in more detail below (Fig. 2).

Step 1: Identifying features affecting grazing

Grazing activities are affected by the spatial heterogeneity of resources and environmental factors, regulated by the grazing behavior of herders and the foraging behavior of herds, and restricted by ecological protection policies. Therefore, the specific implications of the 14 influencing factors from the above four aspects are presented in Table 1. These factors are necessary for spatializing the county-level grazing data.

Table 1 The identified features affecting grazing.

Full size table

Step 2: Extracting theoretical suitable grazing areas

The decision tree approach³⁸ was adopted to extract the theoretical suitable grazing areas for further grazing spatialization (step 2 in Fig. 2). First, the potential grazing area was identified according to the boundary of the grassland ecosystem, because grazing behavior only occurs in the grassland. Then, the unsuitable areas for grazing, i.e., extremely-high-altitude areas and areas adjacent to towns, were removed from the potential grazing area stepwise. The areas strictly prohibited for grazing, i.e., the core areas of national nature reserves³⁹ within grassland areas, were also deemed unsuitable for grazing. Finally, the extracted areas were the theoretically suitable grazing areas.

Step 3: Building grazing spatialization model

(i) Extracting cross-scale feature (CSFs)

In the traditional method, the spatial resolution of the training data (i.e., the average value at the administrative level) differs from that of the predicting data (i.e., the value at the pixel level), and the trained model can only capture the characteristics within the training data. However, the extreme value of the predicting data inevitably exceeds the range of the training data, which can result in underestimation in these parts⁴⁰. To reduce these mismatches, we built an improved method for CSFs extraction (Fig. 2, first part of step 3).

First, the census grazing data are simply distributed from county level to pixel level using the weight of the absolute disturbance (AD) index as Eq. (1). The AD index is measured by Mahalanobis distance using Eq. (2), which is calculated according to the deviation between the potential and observed normalized difference vegetation index (NDVI) values²². Second, the distributed grazing data are graded via the hierarchical clustering method, and the optimal number of the group can be determined using the Davies–Bouldin index (DBI)⁴¹ as Eq. (3), an index for evaluating the quality of clustering algorithm. The smaller the DBI, the smaller the distance within each group. Therefore, the DBI can be used to select the best similar values to minimize the deviation within each group. Finally, we can obtain the scope of the groups within each county using the above two steps and obtain the average value of all independent variables and the dependent variable accordingly. As expected, we can decompose the average value at the county level (traditional features in Fig. 2) into the average value at the group level (improved features in Fig. 2).

$$S{U}_{i}=S{U}_{j}^{C}frac{{w}_{A{D}_{i}}}{{w}_{A{D}_{j}}}$$

(1)

where SU_i and (S{U}_{j}^{C}) are the grazing value for pixel i and the census grazing value for county j; ({w}_{A{D}_{i}}) is the weight of the AD index for pixel i and ({w}_{A{D}_{j}}) represents the summed weight of the AD index values for all pixels in county j.

$$begin{array}{cll}A{D}_{i} & = & sqrt{{({D}_{i}-u)}^{T}co{v}^{-1}({D}_{i}-u)} {D}_{i} & = & NDV{I}_{i}^{T}-NDV{I}_{i}^{P}end{array}$$

(2)

where AD_i is the AD index for pixel i; the vector composed of its observed NDVI (left(NDV{I}_{i}^{T}right)) and potential NDVI (left(NDV{I}_{i}^{P}right)) time-series data could be considered as two points in the feature space for pixel i, and D_i and u are the difference and the mean value of the vector, respectively; cov is the covariance matrix.

$$DB{I}_{k}=frac{1}{k}{sum }_{x=1}^{k}ma{x}_{yne x}left(frac{overline{{a}_{x}}+overline{{a}_{y}}}{left|{delta }_{x}-{delta }_{y}right|}right)$$

(3)

where DBI_k is the DBI coefficient when the cluster number is k; (overline{{a}_{x}}) and (overline{{a}_{y}}) are the average distances of the group x_th and the group y_th, respectively; δ_x and δ_y are the center distance of the group x_th and the group y_th, respectively.

Different from the traditional method, our method can decompose features into multiple features using the grading AD index. The differences among counties will not be easily averaged out. Moreover, our method is less affected by scale mismatch and can be transferred to cross-scale modeling²⁶.

(ii) Building RF model with partitioning

A single model cannot accurately obtain the variation information of the Qinghai–Tibet Plateau with high spatial heterogeneity. The partition model, a widely used method for estimating population distribution and others^42,43, can be incorporated into the proposed model to improve its performance. The thresholds (0.43, 0.35 and 0.21 SU/hm²), determined according to the theoretical livestock carrying capacity (equation S1), were calculated and used to separate independent variables and dependent variable for each grassland types: alpine meadow, alpine steppe and alpine desert steppe (see Section 6.1 for details). Then, the RF models were established, and the training and testing samples were randomly divided in the proportion of 3:1. It is notable that transforming the response variable using natural log prior to RF model fitting is necessary to achieve higher prediction accuracies⁴⁴. Finally, the independent variables at the pixel level were inputted into the two trained RF models, and the corresponding grid grazing dataset was output by combining the two results (Fig. 2, second part of step 3).

(iii) Validating the accuracy of the methods

The performance of the grazing spatialization model was evaluated through a comparison of the predicted value with census value²⁶. Accuracy validation indexes, including coefficients of determination (R²), root mean square error (RMSE), and mean absolute error (MAE), were used to evaluate the performances of the proposed RF-based models (Table 2), as presented in Eq. (4).

$$begin{array}{ccc}{R}^{2} & = & 1-frac{{sum }_{j=1}^{N}{left(S{U}_{j}^{C}-S{U}_{j}^{P}right)}^{2}}{{sum }_{j=1}^{N}{left(S{U}_{j}^{C}-overline{S{U}^{C}}right)}^{2}} RMSE & = & sqrt{frac{{sum }_{j=1}^{N}{left(S{U}_{j}^{C}-S{U}_{j}^{P}right)}^{2}}{N}} MAE & = & frac{{sum }_{j=1}^{N}| S{U}_{j}^{C}-S{U}_{j}^{P}| }{N}end{array}$$

(4)

where (S{U}_{j}^{C}) and (S{U}_{j}^{P}) are the census grazing value and the predicted grazing value for county j, respectively; (overline{S{U}^{C}}) is the average census data for all counties; and N is the number of all counties.

Table 2 The proposed methods and their descriptions.

Full size table

Step 4: Correcting grazing spatialization dataset

(i) Correcting residuals of dataset

Correcting residuals is necessary to obtain datasets with higher accuracy^45,46, because propagating the cross-scale relationship in the RF models will inevitably generate errors⁴⁷. The residuals, calculated by the difference between the average census grazing and predicted grazing values at the administrative level, were used to calibrate the errors related to all pixels within this county. The revised dataset after residual correction is the final product provided in this study. The residual correction method is expressed by Eq. (5), and the process is shown in the fourth step in Fig. 2.

$$S{U}_{i}^{RP}=S{U}_{i}^{P}+{R}_{j}$$

(5)

where (S{U}_{i}^{RP}) denotes the predicted grazing value revised by the residuals for pixel i, (S{U}_{i}^{P}) denotes the predicted grazing for pixel i, and R_j denotes the residuals calculated from the difference between census grazing and predicted grazing data for county j.

(ii) Validating the accuracy of dataset

Two goodness-of-fit indexes were used to validate the consistency of spatial distribution and the temporal trend between predicted grazing data and census grazing data. Generally, the coefficient of determination (R²), defined in Eq. (4), is used to verify the consistency of spatial distribution, and the Nash–Sutcliffe efficiency (NSE, Eq. (6)) is used to verify the consistency of temporal trend. An index value closer to 1 corresponds to a more accurate dataset. Meanwhile, we also collected field grazing data from 56 sites to further validate the spatial accuracy of the dataset, and it measured using the R² in Eq. (4).

$$NSE=1-frac{{sum }_{t=1}^{T}{left(S{U}_{t}^{RP}-S{U}_{t}^{C}right)}^{2}}{{sum }_{t=1}^{T}{left(S{U}_{t}^{C}-overline{S{U}^{{C}^{{prime} }}}right)}^{2}}$$

(6)

where (S{U}_{t}^{RP}) and (S{U}_{t}^{C}) are the predicted grazing value revised by residuals and the census grazing value of all counties in year t, respectively; (overline{S{U}^{{C}^{{prime} }}}) is the average census grazing value of all years; and T is the number of time steps.

(iii) Identifying uncertainties associated with dataset

The uncertainties associated with the dataset originate from the following two aspects: First, the unreasonableness of our method, owing to the errors related to cross-scale modeling or the inappropriate selection of influencing factors, is an important source of uncertainties. Second, the incompleteness of auxiliary variables also introduces uncertainties. In this instance, grassland-free areas are not accurately identified in some counties, but livestock animals are raised in these counties. These counties have no effective value for livestock density prediction. Overall, the uncertainties can be identified in terms of the mean relative error (MRE) in Eq. (7).

$$MRE=frac{{sum }_{j=1}^{N}left|frac{S{U}_{j}^{C}-S{U}_{j}^{RP}}{S{U}_{j}^{C}}right|}{N}ast 100 % $$

(7)

where (S{U}_{j}^{C}) is the census grazing value for county j, (S{U}_{j}^{RP}) is the predicted grazing value revised by residuals for county j, and N is the number of counties.

Data source

Census grazing data at county level

Eight types of livestock, namely cattle, yaks, horses, donkeys, mules, camels, goats, and sheep, were considered according to the regional characteristics, and livestock stocking quantity at the end of year for each county can be determined from statistical yearbooks. However, the numbers of livestock at the county level for some years between 1982 and 2015 were not recorded. The missing data were indirectly approximated from city- or provincial-level data (e.g., interpolation using their temporal trends). Each type of livestock stocking quantity was converted into standard sheep unit (SU) according to the national standards using Eq. (8)⁴⁸, namely the calculation of rangeland carrying capacity (NY/T 635-2015). Of the 244 counties of the Qinghai–Tibet Plateau, only 242 counties were considered, as the census grazing data for the other 2 counties were unavailable. The unit of grazing statistics data at the county level is defined as SU per county per year (SU·county⁻¹·year⁻¹).

$$begin{array}{l}SU={N}_{sheep}+0.8times {N}_{goats}+5times {N}_{cattle}+5times {N}_{yaks+}+ 6times {N}_{horses}+3times {N}_{donkeys}+6times {N}_{mules}+7times {N}_{camels}end{array}$$

(8)

where N_sheep, N_goats, N_cattle, N_yaks, N_horses, N_donkeys, N_mules, N_camels are the number of sheep, goats, cattle, yaks, horses, donkeys, mules, and camels at the year-end, respectively. SU denotes the standard sheep unit (SU·county⁻¹·year⁻¹).

Data of grazing influencing factors at pixel level

The types of features affecting grazing were obtained from the first step described in Methods, and the detailed information, such as original spatiotemporal resolution, format, and source, is shown in Table 3. The format (i.e., GeoTIFF), spatial resolution (i.e., 0.083°), and the number of rows and columns of the gridded features were leveraged to further produce a high-resolution grazing dataset.

Table 3 Data source of grazing influence factors.

Full size table

Source: Ecology - nature.com

A high-resolution gridded grazing dataset of grassland ecosystem on the Qinghai–Tibet Plateau in 1982–2015

Study area

Methodological framework

Step 1: Identifying features affecting grazing

Step 2: Extracting theoretical suitable grazing areas

Step 3: Building grazing spatialization model

(i) Extracting cross-scale feature (CSFs)

(ii) Building RF model with partitioning

(iii) Validating the accuracy of the methods

Step 4: Correcting grazing spatialization dataset

(i) Correcting residuals of dataset

(ii) Validating the accuracy of dataset

(iii) Identifying uncertainties associated with dataset

Data source

Census grazing data at county level

Data of grazing influencing factors at pixel level

Genomics discovery of giant fungal viruses from subsurface oceanic crustal fluids

ITALIAN LANGUAGE

ENGLISH LANGUAGE