In this chapter, we describe in detail the method of data set generation, including data collection, data modification and interpolation extension, and grid data generation (Fig. 2).
First, the collection of population and water withdrawal data. Collect as much as possible of the national and sub-national permanent population and water withdrawal data released by governments and institutions on a global scale. Here we provide the source of our data collection.
Second, establish a national and sub-national default data interpolation model. Based on the shape of the sample data scatter plot, determine the most appropriate curve model. The simulation modeling is implemented by EXCEL and provided one by one according to the national level.
Third, create spatial distribution grids. Spread the population density to the administrative unit and artificial surface, and spread the water intensity to the administrative unit, and artificial surface and cultivated land (Spatial distribution section for details).
Fourth, data verification. For population data, we compare the global population of the revised results with data of the World Bank and FAO, and calculate the correlation and deviation between the revised results and the other two sets of data. For the water withdrawal data, we divide the measured data into calibration and verification periods, re-interpolate the data using the data of the calibration period, and then verify the simulation accuracy by using the data of the verification period and the simulation.
Data collection and pretreatment
The data sources include government population data for xx nation and xx sub-nation, government water withdrawal data for xx nations and xx sub-nations, national population and water withdrawal data from the World Bank12, and national population and water withdrawal data from FAO8, water withdrawal data from the United Nation13, national population and water withdrawal data from Eurostat14, and Globeland3015 data for 2000 and 2010. Among them, xx refers to one of many countries in the data set, and only serves as an indicator.
Globally, it is believed that the accuracy rate of census results obtained by counting the population of various administrative units in the country is the highest at present when a large amount of manpower and material resources are spent by the country itself16. In addition to the census conducted every certain year, the statistical department gets a high accuracy rate by calculating the overall figures according to the sample survey of population changes and the random sample survey of fertility rate in some areas and some units. To sum up, we believe that the data released by our country on the statistical official website is the most reliable.
When national population data are missing, it is generally believed that the data and trends of the World Bank and FAO are authoritative. When the data of the World Bank and FAO are complete, the World Bank data prevails as reference population data. When the length of World Bank data is shorter that of than FAO, the FAO data is used as reference population data17.
For water withdrawal data, FAO and UN data are generally considered authoritative when government water withdrawal data is missing. When the FAO and UN data are both complete, the FAO data is used as a reference for water withdrawal data.
Interpolation and extrapolation of national and sub-national population data
When the lack of data is obvious, the results obtained by the simplest method often have more reference value. The following four basic methods are used for the processing of population data9,10,11,18,19,20.
Interpolation method assuming increasing in arithmetic series
If discontinuities exist in government data, and the number of data increases in arithmetic series according to the judgement, then the linear interpolation method can be used based on a linear model of arithmetic series growth. This method is suitable for interval data interpolation with a short interruption time and relatively uniform data growth scale. The interpolation model is as follows:
$${P}_{N,k}=left[frac{Ileft(jright)-Ileft(iright)}{j-i}cdot left(k-iright)+Ileft(iright)right]cdot {P}_{W,k}$$
(1)
Where, PN,k is the government data for the k year, i ≤ k ≤ j; PW,k is the reference data for the k year; I(j) and I(i) are the ratios of government data to reference data for the j year and i year, respectively.
Trend extrapolation method based on general trend curve model
If there are continuous points in the government data, it is better to obtain interpolation results by assisting based on the trend of the ratio of government data to reference data. General trend line functions such as linear, conic, cubic and exponential curves can be used, and the fitting result needs to be comprehensively judged by the linear change of the reference data, and finally a more suitable interpolation result can be obtained. This method is more suitable for interval data interpolation with shorter time and faster data growth.
$${P}_{N,k}=F(k)cdot {P}_{W,k}$$
(2)
where, PN,k is the government data in the k year, i ≤ k ≤ j; PW,k is the reference data for the k year; F(k) is the trend for the ratio of government data to reference data in the k year.
Scale up to the same ratio
If there is only one year of government data, then the reference data will be scaled up to the same ratio according to the ratio of government data to the reference data of the corresponding year.
$$I=frac{{P}_{N}}{{P}_{W}},{P}_{N,o}=Icdot {P}_{W,o}$$
(3)
Where, PN is the government data; PW is the reference data; I is the ratio of government data to reference data; PN,o is the default government data; PW,o is the reference data corresponding to the default; o is the default year.
Based entirely on government data or reference data
If there is complete government data, the government data is used as the final population result. If there is no government data, the reference data is used as the final result of the population.
Interpolation and extrapolation of national and sub-national water withdrawal data
The total amount of water withdrawal in various countries varies greatly, but the per capita water withdrawal of the country generally remains within a certain range. Therefore, we first calculate the reference data, and then interpolate and extrapolate the missing per capita water withdrawal data. The methods can also be summarized into the following five categories.
Interpolation method assuming increasing in arithmetic series
The calculation principle is the same as the interpolation method of national population data. This method is more suitable for interval data interpolation with shorter and discrete data, such as the data form before 1990 in Fig. 6(c).
Trend extrapolation method based on revised per capita water withdrawal growth rate
If there are continuous points in the data, we assume that the per capita water withdrawal versus time curve is consistent with the S curve, that is, the per capita water withdrawal shows only a slow change in the first years and the last years. We first calculate the growth rate of per capita water withdrawal in the last two years or the first two years, adjust the final growth rate proportionally to reflect the subsequent changes, and adjust the first growth rate proportionally to reflect the previous changes. Equation (4) represents a method of extrapolating the previous missing value data, and Eq. (5) represents a method of extrapolating the subsequent missing value data. This method is more suitable for the situation where continuous government data exists and the change trend of per capita water consumption is clear, such as the form of continuous data after 1990 in Fig. 6(c).
$$left{begin{array}{rll}{s}_{i} & = & frac{{w}_{i}-{w}_{i+1}}{{w}_{i+1}} {s}_{i-1} & = & {s}_{i}cdot (1-theta ) {w}_{i-1} & = & {w}_{i}cdot (1+{s}_{i-1})end{array}right.$$
(4)
$$left{begin{array}{rll}{s}_{j} & = & frac{{w}_{j}-{w}_{j-1}}{{w}_{j-1}} {s}_{j+1} & = & {s}_{j}cdot left(1-theta right) {w}_{j+1} & = & {w}_{j}cdot left(1+{s}_{j+1}right)end{array}right.$$
(5)
Where wi-1 is the missing per capita water withdrawal value for time step i-1; si-1 is the missing reverse order growth rate value for time step i-1; wi and wi+1 are the first two known per capita water withdrawal values for time step i and i + 1, and si-1 is the known reverse order growth rate value for time step i-1. For Eq. (5), wj+1 is the missing per capita water withdrawal value for time step j + 1; sj+1 is the missing growth rate value for time step j + 1; wj-1 and wj are the last two known per capita water withdrawal values for time step j and j-1, and sj is the known growth rate value for time step j. To ensure that the per capita water withdrawal in the front of the series or in the latter part of the series does not change too fast, the equation introduces θ to represent the correction coefficient for the growth rate, which is generally in the range of 0.1 to 0.2.
Scale up to the same ratio or smoothing spline fitting
If there is only one data released, the per capita water withdrawal of that year will be used for all years. For water withdrawal data with long time spans and more data but many intervals, we use smoothing spline to provide smooth interpolation over time, taking into account the equilibrium of per capita water withdrawal fluctuations.
Proximity of adjacent region
If no national water withdrawal data is released, based on the country’s level of development and geographic location, the per capita water withdrawal of adjacent countries with similar development levels is selected as an approximate value for the country’s per capita water withdrawal value.
The treatment of sub-national water withdrawal data is similar to sub-national population data. First, the ratio of the sub-national data to the national data of the known year is calculated, and then the interpolation and extrapolation methods are used to calculate the ratio of the missing values, and finally sub-national data is obtained by the national data and the ratio.
Spatial distribution
This research further considers the indicative role of specific land use types. Spatial distribution, which means that the data is distributed to a meaningful area. It is assumed that the population and water are only used on an artificial surface and cultivated land. We mainly used the globeland30 data15 of 2000 and 2010 to process the data before and after 2000, respectively (Figs. 3 and 4).
Based on ArcGIS Desktop 10.2, convert the global land use grid into a vector format, and then extract the global artificial surface and cultivated land. The population density and water intensity on the grid are expressed as follows21:
$$S{D}_{ad,P}=frac{{P}_{ad}}{{A}_{ad}},S{D}_{lu,P}=frac{{P}_{ad}}{{A}_{lu,a}}$$
(6)
$$S{D}_{ad,W}=frac{{W}_{ad}}{{A}_{ad}},{SD}_{lu{rm{,}}W}=frac{{W}_{ad}}{{A}_{lu,ac}}$$
(7)
Where, SDad, P and SDad, W are the population density and water intensity of an administrative unit, respectively; SDlu, P is the population density on the artificial surface of an administrative unit; SDlu, W is the water intensity on the artificial surface and cultivated land of an administrative unit; Pad and Wad are the population and water withdrawal of an administrative unit, respectively; Aad, Alu, a and Alu, ac are the area of an administrative unit, the area of the artificial surface of an administrative unit, and the area of artificial surface and cultivated land of an administrative unit.
Source: Resources - nature.com