A data set of distributed global population and water withdrawal from 1960 to 2020

In this chapter, we describe in detail the method of data set generation, including data collection, data modification and interpolation extension, and grid data generation (Fig. 2).

Fig. 2

Schematic outline to produce the global population and water withdrawal products.

Full size image

First, the collection of population and water withdrawal data. Collect as much as possible of the national and sub-national permanent population and water withdrawal data released by governments and institutions on a global scale. Here we provide the source of our data collection.

Second, establish a national and sub-national default data interpolation model. Based on the shape of the sample data scatter plot, determine the most appropriate curve model. The simulation modeling is implemented by EXCEL and provided one by one according to the national level.

Third, create spatial distribution grids. Spread the population density to the administrative unit and artificial surface, and spread the water intensity to the administrative unit, and artificial surface and cultivated land (Spatial distribution section for details).

Fourth, data verification. For population data, we compare the global population of the revised results with data of the World Bank and FAO, and calculate the correlation and deviation between the revised results and the other two sets of data. For the water withdrawal data, we divide the measured data into calibration and verification periods, re-interpolate the data using the data of the calibration period, and then verify the simulation accuracy by using the data of the verification period and the simulation.

Data collection and pretreatment

The data sources include government population data for xx nation and xx sub-nation, government water withdrawal data for xx nations and xx sub-nations, national population and water withdrawal data from the World Bank¹², and national population and water withdrawal data from FAO⁸, water withdrawal data from the United Nation¹³, national population and water withdrawal data from Eurostat¹⁴, and Globeland30¹⁵ data for 2000 and 2010. Among them, xx refers to one of many countries in the data set, and only serves as an indicator.

Globally, it is believed that the accuracy rate of census results obtained by counting the population of various administrative units in the country is the highest at present when a large amount of manpower and material resources are spent by the country itself¹⁶. In addition to the census conducted every certain year, the statistical department gets a high accuracy rate by calculating the overall figures according to the sample survey of population changes and the random sample survey of fertility rate in some areas and some units. To sum up, we believe that the data released by our country on the statistical official website is the most reliable.

When national population data are missing, it is generally believed that the data and trends of the World Bank and FAO are authoritative. When the data of the World Bank and FAO are complete, the World Bank data prevails as reference population data. When the length of World Bank data is shorter that of than FAO, the FAO data is used as reference population data¹⁷.

For water withdrawal data, FAO and UN data are generally considered authoritative when government water withdrawal data is missing. When the FAO and UN data are both complete, the FAO data is used as a reference for water withdrawal data.

Interpolation and extrapolation of national and sub-national population data

When the lack of data is obvious, the results obtained by the simplest method often have more reference value. The following four basic methods are used for the processing of population data^{9,10,11,18,19,20}.

Interpolation method assuming increasing in arithmetic series

If discontinuities exist in government data, and the number of data increases in arithmetic series according to the judgement, then the linear interpolation method can be used based on a linear model of arithmetic series growth. This method is suitable for interval data interpolation with a short interruption time and relatively uniform data growth scale. The interpolation model is as follows:

$${P}_{N,k}=left[frac{Ileft(jright)-Ileft(iright)}{j-i}cdot left(k-iright)+Ileft(iright)right]cdot {P}_{W,k}$$

(1)

Where, P_N,k is the government data for the k year, i ≤ k ≤ j; P_W,k is the reference data for the k year; I(j) and I(i) are the ratios of government data to reference data for the j year and i year, respectively.

Trend extrapolation method based on general trend curve model

If there are continuous points in the government data, it is better to obtain interpolation results by assisting based on the trend of the ratio of government data to reference data. General trend line functions such as linear, conic, cubic and exponential curves can be used, and the fitting result needs to be comprehensively judged by the linear change of the reference data, and finally a more suitable interpolation result can be obtained. This method is more suitable for interval data interpolation with shorter time and faster data growth.

$${P}_{N,k}=F(k)cdot {P}_{W,k}$$

(2)

where, P_N,k is the government data in the k year, i ≤ k ≤ j; P_W,k is the reference data for the k year; F(k) is the trend for the ratio of government data to reference data in the k year.

Scale up to the same ratio

If there is only one year of government data, then the reference data will be scaled up to the same ratio according to the ratio of government data to the reference data of the corresponding year.

$$I=frac{{P}_{N}}{{P}_{W}},{P}_{N,o}=Icdot {P}_{W,o}$$

(3)

Where, P_N is the government data; P_W is the reference data; I is the ratio of government data to reference data; P_N,o is the default government data; P_W,o is the reference data corresponding to the default; o is the default year.

Based entirely on government data or reference data

If there is complete government data, the government data is used as the final population result. If there is no government data, the reference data is used as the final result of the population.

Interpolation and extrapolation of national and sub-national water withdrawal data

The total amount of water withdrawal in various countries varies greatly, but the per capita water withdrawal of the country generally remains within a certain range. Therefore, we first calculate the reference data, and then interpolate and extrapolate the missing per capita water withdrawal data. The methods can also be summarized into the following five categories.

Interpolation method assuming increasing in arithmetic series

The calculation principle is the same as the interpolation method of national population data. This method is more suitable for interval data interpolation with shorter and discrete data, such as the data form before 1990 in Fig. 6(c).

Trend extrapolation method based on revised per capita water withdrawal growth rate

If there are continuous points in the data, we assume that the per capita water withdrawal versus time curve is consistent with the S curve, that is, the per capita water withdrawal shows only a slow change in the first years and the last years. We first calculate the growth rate of per capita water withdrawal in the last two years or the first two years, adjust the final growth rate proportionally to reflect the subsequent changes, and adjust the first growth rate proportionally to reflect the previous changes. Equation (4) represents a method of extrapolating the previous missing value data, and Eq. (5) represents a method of extrapolating the subsequent missing value data. This method is more suitable for the situation where continuous government data exists and the change trend of per capita water consumption is clear, such as the form of continuous data after 1990 in Fig. 6(c).

$$left{begin{array}{rll}{s}_{i} & = & frac{{w}_{i}-{w}_{i+1}}{{w}_{i+1}} {s}_{i-1} & = & {s}_{i}cdot (1-theta ) {w}_{i-1} & = & {w}_{i}cdot (1+{s}_{i-1})end{array}right.$$

(4)

$$left{begin{array}{rll}{s}_{j} & = & frac{{w}_{j}-{w}_{j-1}}{{w}_{j-1}} {s}_{j+1} & = & {s}_{j}cdot left(1-theta right) {w}_{j+1} & = & {w}_{j}cdot left(1+{s}_{j+1}right)end{array}right.$$

(5)

Where w_i-1 is the missing per capita water withdrawal value for time step i-1; s_i-1 is the missing reverse order growth rate value for time step i-1; w_i and w_i+1 are the first two known per capita water withdrawal values for time step i and i + 1, and s_i-1 is the known reverse order growth rate value for time step i-1. For Eq. (5), w_j+1 is the missing per capita water withdrawal value for time step j + 1; s_j+1 is the missing growth rate value for time step j + 1; w_j-1 and w_j are the last two known per capita water withdrawal values for time step j and j-1, and s_j is the known growth rate value for time step j. To ensure that the per capita water withdrawal in the front of the series or in the latter part of the series does not change too fast, the equation introduces θ to represent the correction coefficient for the growth rate, which is generally in the range of 0.1 to 0.2.

Scale up to the same ratio or smoothing spline fitting

If there is only one data released, the per capita water withdrawal of that year will be used for all years. For water withdrawal data with long time spans and more data but many intervals, we use smoothing spline to provide smooth interpolation over time, taking into account the equilibrium of per capita water withdrawal fluctuations.

Proximity of adjacent region

If no national water withdrawal data is released, based on the country’s level of development and geographic location, the per capita water withdrawal of adjacent countries with similar development levels is selected as an approximate value for the country’s per capita water withdrawal value.

The treatment of sub-national water withdrawal data is similar to sub-national population data. First, the ratio of the sub-national data to the national data of the known year is calculated, and then the interpolation and extrapolation methods are used to calculate the ratio of the missing values, and finally sub-national data is obtained by the national data and the ratio.

Spatial distribution

This research further considers the indicative role of specific land use types. Spatial distribution, which means that the data is distributed to a meaningful area. It is assumed that the population and water are only used on an artificial surface and cultivated land. We mainly used the globeland30 data¹⁵ of 2000 and 2010 to process the data before and after 2000, respectively (Figs. 3 and 4).

Fig. 3

The specific regional average population density from 1960 to 2020. (a) The administrative units. (b) The artificial surface grids. Obtain the population of the above-mentioned two groups of specific regions within each 1 km grid in an average manner.

Full size image

Fig. 4

The specific regional average water intensity from 1960 to 2020. (a) The administrative units. (b) The artificial surface and cultivated land grids. Obtain the water withdrawal of the above-mentioned two groups of specific regions within each 1 km grid in an average manner.

Full size image

Based on ArcGIS Desktop 10.2, convert the global land use grid into a vector format, and then extract the global artificial surface and cultivated land. The population density and water intensity on the grid are expressed as follows²¹:

$$S{D}_{ad,P}=frac{{P}_{ad}}{{A}_{ad}},S{D}_{lu,P}=frac{{P}_{ad}}{{A}_{lu,a}}$$

(6)

$$S{D}_{ad,W}=frac{{W}_{ad}}{{A}_{ad}},{SD}_{lu{rm{,}}W}=frac{{W}_{ad}}{{A}_{lu,ac}}$$

(7)

Where, SD_{ad, P} and SD_{ad, W} are the population density and water intensity of an administrative unit, respectively; SD_{lu, P} is the population density on the artificial surface of an administrative unit; SD_{lu, W} is the water intensity on the artificial surface and cultivated land of an administrative unit; P_ad and W_ad are the population and water withdrawal of an administrative unit, respectively; A_ad, A_{lu, a} and A_{lu, ac} are the area of an administrative unit, the area of the artificial surface of an administrative unit, and the area of artificial surface and cultivated land of an administrative unit.

Source: Resources - nature.com

A data set of distributed global population and water withdrawal from 1960 to 2020

Data collection and pretreatment

Interpolation and extrapolation of national and sub-national population data

Interpolation method assuming increasing in arithmetic series

Trend extrapolation method based on general trend curve model

Scale up to the same ratio

Based entirely on government data or reference data

Interpolation and extrapolation of national and sub-national water withdrawal data

Interpolation method assuming increasing in arithmetic series

Trend extrapolation method based on revised per capita water withdrawal growth rate

Scale up to the same ratio or smoothing spline fitting

Proximity of adjacent region

Spatial distribution

“Drawing Together” is awarded Norman B. Leventhal City Prize

Finding community in high-energy-density physics

ITALIAN LANGUAGE

ENGLISH LANGUAGE