Data source and preprocessing
Considering factors such as amount of cloud and time intervals of image, four remote sensing images with a spatial resolution of 30 m, including Landsat 5 Thematic Mapper (TM) images for 08-21-2000, 09-04-2005 and 09-18-2010, and Landsat 8 Operational Land Imager (OLI) for 09-02-2016,were obtained from the Geospatial Data Cloud Platform (http://www.gscloud.cn). LULC information was extracted from these remote sensing images. In addition, the digital elevation model (DEM) with a spatial resolution of 30 m was obtained from the website. Elevation and slope information were derived from DEM data and used as terrain driving factors for scenario simulation. Other supporting data, such as Weishan County land use data, mine distribution data, general land use planing (2006–2020) and mineral resources planning (2008–2015), Jining City coal mining subsidence land rearrangement planning (2016–2030), were obtained from Weishan Natural Resources and Planning Bureau. These data were used for better data analysis.
Considering severe ground subsidence and seeper in the study area, and referring to national standards: Current Land Use Classification (GB/T 21010-2017), remote sensing images were interpreted into six LULC types: farmland, other agricultural land, urban and rural construction land, subsided seeper area, water area, and tidal wetland.
In the process of image interpretation, firstly, the remote sensing image was divided into two regions: one region were the lake and the surrounding tidal wetland, and the other region included farmland, other agricultural land, urban and rural construction land, subsided seeper area, etc.
In region 1, decision tree classification, combined with the Modified Normalized Difference Water Index (MNDWI), was used to extract lakes. Then we masked them in region 1. The Normalized Difference Vegetation Index (NDVI) was calculated for the remaining image of region 1. Tidal wetland was mainly distributed along rivers and lakes, and NDVI value was higher than that of farmland and other vegetation. By analyzing its geographical distribution and NDVI value, and referring to Weishan County land use data, the appropriate threshold was selected to extract tidal wetland.
The spectral signature of rivers, ditches and aquaculture ponds in other agricultural land in region 2 could be easily distinguished from other surface features. They could be extracted step by step by manual visual interpretation and empirical knowledge, referring to Weishan County land use data and water system data. Then we masked them separately in region 2. After extracting rivers, ditches, aquaculture ponds with high water content, the remaining LULC type with high water content in region 2 was subsided seeper area. According to the relationship of spectral signature of different LULC types, it was concluded that among the remaining LULC types in region 2, only TM3 band value of subsided seeper area was higher than TM5 band value. Using this characteristic, subsided seeper area could be distinguished from other LULC types. After extracting subsided seeper area, the remaining LULC types in region 2 were farmland and urban and rural construction land. The spectral characteristics of them were very different. Therefore, they could be distinguished using support vector machine (SVM) classification method, and their respective binary images were generated using decision tree method.
The extracted six LULC types were shown in single layer and binary form respectively. Six LULC types were coded and synthesized into one image. We obtained 2000, 2005, 2010, 2016 LULC type maps (Fig. 2). Finally classification post-processing and accuracy evaluation were operated.
The LULC types maps of 2000, 2005, 2010 and 2016. Maps were generated using ArcGIS 10.1 for Desktop (http://www.esri.com/software/arcgis/arcgis-for-desktop).
The accuracy of the interpretation results was verified by confusion matrix and kappa coefficient. The kappa coefficients of the four interpretation maps were 0.84, 0.85, 0.82 and 0.86, respectively (Table 1). The accuracy could meet the needs of further research.
By reading previous research results37,38,39,40,41, based on the entropy theory, in the same study area, high spatial resolution data contains more entropy than low spatial resolution data, and reflecting more detailed information, but it will increase the uncertainty of prediction results and reduce the prediction accuracy. Although the prediction accuracy of low spatial resolution data increases, it will lose lots of detailed information. In order to ensure the accuracy of the simulation, considering the area of the study area and data requirement of the CLUE-S model, the interpreted LULC maps with a resolution of 30 m exceed the upper limit of the CLUE-S model data requirement, so the LULC maps were resampled to multiple scales including 60 m, 90 m, 120 m, and 150 m to facilitate logistic regression analysis of LULC types and driving factors.
Selection and processing of driving factors
To interpret the relationship between the LULC and its driving factors in the mining area, we not only need to identify the driving factors that have greater explanatory power for LULC change, but also need to quantitatively describe the relationship between driving factors and LULC types.
Considering the accessibility, usability of the data and the actual conditions in the study area, seven driving factors were selected based on the land use map of Weishan County in 2005 and the DEM data5,10,11,13,26,28,29,30. The driving factors included: (1) terrain factors, including elevation and slope factors; (2) five accessibility factors, including the nearest distance between each grid pixel and the main roads, the major rivers, the residential area, the major mines, and the ditches. The 30 m grid data of each driving factor were resampled to 60 m, 90 m, 120 m and 150 m respectively.
In this study, BLRM was used to explore the relationship between LULC change and the related 7 driving factors. BLRM is sensitive to multicollinearity. In order to eliminate the influence of collinearity on the regression results, the multicollinearity between independent variables was diagnosed before the regression model was established.
The receiver operating characteristic (ROC) curve was used to evaluate the accuracy of regression analysis results at different scales. The results showed that using 60 m resolution provided more accurate regression analysis results and suffered less loss of LULC and driving factor information during resampling. Therefore, we used 60 m × 60 m grid cell data to driving forces analysis.
Raster maps of each driving factor at a resolution scale of 60 m are shown in Fig. 3.
Raster maps of driving factors at a resolution scale of 60 m. Maps were generated using ArcGIS 10.1 for Desktop (http://www.esri.com/software/arcgis/arcgis-for-desktop).
Logistic regression analysis of LULC types and driving factors
BLRM is often used for regression analysis of explanatory binary variables. The presence and absence of a certain type of LULC in a specific area is set as 1 and 0, respectively, which is characteristic for binary variable. Therefore, we used BLRM to calculate the probability (P) of various LULC types in a specific spatial location, and its mathematical expression is:
$$begin{aligned} ln left( frac{P}{1-P}right) = beta _0 + beta _1 X end{aligned}$$
(1)
where (frac{P}{1-P}) is the ’odds ratio’ of an event, abbreviated as ( Omega ), which represents the odds that an outcome will occur given a particular condition compared to the odds of the outcome occurring in the absence of that condition; (beta _0) is a constant; (beta _1) is the correlation coefficient of an explaining variable and an explained variable. Making mathematical transformation of the above expression, we get: (Omega = (frac{P}{1-P}) = e^{beta _0 + beta _1 X}).
Regression analysis using BLRM, we divided the study area into many grid cells. Taking each LULC type as the explained variable, and the driving factor causing LULC change as the explanatory variable, we calculated the odds ratio of each LULC type in a specific spatial location, and analyzed the relationship between each LULC type and the driving factors. The calculating equation is:
$$begin{aligned} mathrm{Logit} P = ln left( frac{P_i}{1-P_i}right) = beta _0 + beta _1 X_{1,i} + beta _2 X_{2,i} + cdots + beta _n X_{n,i} end{aligned}$$
(2)
Making mathematical transformation of the above equation, we get:
$$begin{aligned} P_i = frac{e^{(beta _0 + beta _1 X_{1,i} + beta _2 X_{2,i} + cdots + beta _n X_{n,i})}}{1+e^{(beta _0 + beta _1 X_{1,i} + beta _2 X_{2,i} + cdots + beta _n X_{n,i})}} end{aligned}$$
(3)
where: (P_i) is the probability of a certain LULC type i in a grid cell, (X_{1,i}sim X_{n,i}) are the driving factors of LULC type i, (beta _0) is the constant, (beta _1sim beta _n) are the correlation coefficients of each driving factor and LULC type i.
The receiver operating characteristic (ROC) was used to evaluate the accuracy of regression analysis results. The accuracy can be measured by calculating the area under the ROC curve. The area value is between 0.5 and 1. The closer the value is to 1, the higher the accuracy is. In general, the area under the ROC curve is greater than 0.7, which indicates that the selected factor has good explanatory power27,42.
CLUE-S simulation and accuracy validation
Before using the CLUE-S model for futural LULC scenario simulation in mining area, the prediction accuracy needs to be verified. Based on the data of LULC in 2005, the spatial distribution pattern of LULC in 2016 was predicted firstly.
The modeling accuracy was evaluated based on the Kappa index by comparing the actual LULC map in 2016 with the simulated in 201627,43,44. Equation (4) gives one of the most popular Kappa index equations: i.e.,
$$begin{aligned} mathrm{Kappa}=frac{P_o-P_c}{P_p-P_c} end{aligned}$$
(4)
where (P_o) is the observed proportion correct, (P_c) is the expected proportion correct due to chance, (P_c) =1/n, n is the number of LULC types, and (P_p) is the proportion correct when classification is perfect.
In order to further verify the accuracy of the model simulation, we also calculated kappa for quantity (Kquantity).
Scenario setting of futural LULC simulation
Due to the continuous population growth and mineral exploitation in the study area, the land resources, especially farmland resources, have become increasingly scarce and the environment has been deteriorating. Based on the simulation and validated results during 2005-2016, we defined three scenarios—namely natural development scenario, ecological protection scenario, and farmland protection scenario—to predict LULC spatial patterns for 2025.
Natural development scenario
In this scenario, the land use demand of the study area was basically not restricted by policies in near future. We assumed that the change rate of each LULC type in near future was consistent with the change trend from 2005 to 2016. So it is defined as natural development scenario. Using Markov model to obtain the area transition probability matrix of each year from 2017 to 2025, and taking the proportion of each LULC type area in the total study area in 2005 as the initial state matrix, the area of each LULC type in 2025 under the natural development scenario was predicted.
Based on the characteristics and trend of the LULC change from 2005 to 2016, after appropriately adjusting the transition probability matrix of different LULC types, we predicted the demands of each LULC type in 2025 under ecological protection scenario and farmland protection scenario using Markov model45,46.
Ecological protection scenario
This scenario emphasizes protecting the ecological environment, restricting the conversion of the LULC types that have more regulatory effects on the ecosystem, such as tidal wetland and water area, to other land use types. Garden land, woodland, grassland, and aquaculture land, belong to other agricultural land, which have regulatory effects on the local ecosystem, so their conversion to other LULC types should be restricted as well.
Farmland protection scenario
According to the guidelines of “the general land use planning in Weishan County (2006-2020)”, we should maximize the potential use of current construction land, implement intensive and economical utilization of construction land, and use less or not use farmland to economical construction. So in order to ensure the dynamic balance of total farmland amount and the regional food supply security, in the farmland protection scenario, the conversion from farmland to other land use types should be restricted. The projected land use demands for 2025 under the three different scenarios are shown in Table 2.
Source: Ecology - nature.com