Terraces are a land type that is defined by its shape. They have a distinct morphological structure and edge features that distinguish them from other land types. In this study, we define terraces as agricultural land with strip or wavy sections built on slopes greater than 2° along the contour direction. Figure 1 depicts Google Maps satellite images of terraces in the Loess Plateau region. Terraces can be distinguished from other features in remote sensing images based on their colour, morphology, texture, and structure. Terraces can be distinguished from construction land, water, glaciers, and deserts by their colours. Figure 1b–d shows terraces that are primarily green and yellow. Furthermore, terraces are generally distributed along the contour direction, and can therefore be identified based on their morphology. Terraced field ridges curve downward and resemble strips in Fig. 1b,d or circles or ovals in Fig. 1c rather than a neat grid-like distribution. These features differ in morphology from the flat land shown in Fig. 1h. Based on texture and structure, the field area of terraces can be identified based on their strong edge features, as shown in Fig. 1b–d. The edges of terraces have dark stripes caused by oblique illumination received from the sun, and the field ridge of terraces often intercepts part of the sunlight due to their height. Sloping cultivated land, as shown in Fig. 1g, has no evident terraced wall. The outline of sloping cultivated land in the high-resolution image is curved, with no prominent edge features. These findings are critical differences distinguishing terraces and sloping land in high-resolution images.
Deep learning-based terrace extraction model
The DLTEM is a terrace extraction model that uses deep learning algorithms and other supplementary information. Initially, a preliminary terrace distribution map was obtained using a deep learning algorithm. It was then combined with the spectral and digital elevation model (DEM) elevation information to fine-tune the results. The final spatial distribution of the terraces was produced by manual correction (Fig. 2). Traditional land classification models or methods typically superimpose spectral, elevation, and morphological texture information from remote sensing images together for training, such as random forest, which is easily ignored in training since morphological texture information accounts for a relatively small amount of the total information. This leads to significant errors while identifying land classes with textural characteristics. In contrast, the DLTEM focuses on morphological texture information from remote sensing images and classifies it into land classes, followed by auxiliary correction through additional information. Thus, this method is more suitable to extract terraces enriched with texture structure information.
The UNet++ network is a classic deep learning algorithm that is uniquely unrivaled in extracting colour, morphology, texture, and structure features from images and applying them for classification. In comparison with other Convolutional Neural Network (CNN) classification models (e.g., Fully Convolutional Networks (FCN)), it has high classification accuracy, fast computation speed, strong robustness, and provides variable importance metrics. Therefore, in this study, the UNet++ network was adopted as the network framework for deep learning; the primary data source used was high-resolution satellite imagery from 2019. DEM (SRTM v4.1) data were used to obtain the elevation information and GlobeLand30 data were used to obtain the spectral information. The results were corrected to construct the final map of the distribution of terraces in the Loess Plateau.
Study area
The Loess Plateau, one of China’s four major plateaus, is located in northern central China (34°–40° N and 103°–114° E) (Fig. 1). It is covered by a thick loess layer that ranges in thickness from 50 to 80 m, and is the world’s largest loess deposition area, covering 648,700 km2. The altitude of the Loess Plateau ranges from 800 to 3,000 m, its average annual temperature is 6–14 °C, and its average annual precipitation is 200–700 mm. Since ancient times, the Loess Plateau has been used for agriculture because of its fine grains, fluffy soil texture, and rich soluble mineral nutrients, all of which are conducive to crop cultivation. However, long-term unsustainable land use caused the degradation of the vegetation cover in the Loess Plateau. Moreover, the land is degrading due to considerable nutrient loss caused by long-term water erosion in conjunction with natural conditions, such as arid climate, loose soil, concentrated and heavy rainfall. The fragmented ground in the region has made it susceptible to soil erosion. It has also become the primary source of Yellow River sediment as a result of the massive flow of eroded sediment into the Yellow River, posing a serious threat to the economic and social development of the lower Yellow River basin.
Terracing is one of the main measures used to enhance crop yield and conserve soil and water in the region. Since the 1980s, the Chinese government has implemented many large-scale slope-to-terrace projects in the Loess Plateau. Especially in recent years, the outline of the comprehensive management plan for the Loess Plateau area (2010–2030) has been promulgated with a planned area of 2.608 million hectares for slope to terrace conversion, making it the core area of slope to terrace conversion projects in the country.
Data preparation
Although high-resolution satellite images can be an important data source for the spatial distribution of terraces on the Loess Plateau, they are not ideal for terraces classification. On the one hand, a higher resolution image requires more storage space. On the other hand, it reduces the efficiency, prolongs the interpretation time, and increases the noise in the image, affecting the interpretation accuracy. Most of the terraces on the Loess Plateau are wider than 7 m (Fig. 1b–d). These are wide terraces in comparison with the narrow terraces of southern China (Fig. 1f), which are less than 2 m wide. Furthermore, it is also easy to mistake the fish-scale pits constructed for soil and water conservation for terraces because of their similarity in form. However, as the width of their field surface is less than 1.5 m, remote sensing images with a 2 m resolution can effectively prevent the false extraction of such features. Based on the actual situation of this study area, we chose a high-resolution image with a spatial resolution of 1.89 m from Google Maps 16 level as the data source. The colour, texture, and morphological features of terraces in the images show seasonal variations. In autumn and winter, the weather is dry, and the vegetation is less shaded in the Loess Plateau. During this time, even the edge features become more visible and easier to identify. As a result, we selected images from October 2018 to February 2019 whenever possible (Fig. 1c,d).
Deep learning network selection
Land classification is the extraction of land types from remote sensing images using image segmentation techniques. As the key technology of image segmentation, the Fully Convolutional Network (FCN) classifies images at the pixel level. FCN follows the network structure pattern of encoding and decoding, which adopts AlexNet as the encoder of the network and then employs transposed convolution to up-sample the feature map output from the final convolutional layer of the encoder to the resolution of the input image to achieve pixel-level image segmentation. However, due to the large error in image pixel boundary localization, Ronneberger et al.29 improved the FCN structure in 2015 by expanding the capacity of the network decoder by adding a contracting path to the encoding and decoding modules to achieve more accurate pixel boundary localisation29. The U-Net network is commonly used in medical image processing because it requires a small number of training samples and is effective in classifying objects with a fixed structure and limited semantic information. This network is comparable to natural image semantic segmentation such as Deeplab v3+, which has a smaller number of model parameters and the same effect.
Since the texture and morphological features of terraces and human organs have certain similarities, they are primarily manifested by simple semantic information contained within the terrace images themselves. Thus, high-level semantic information and low-level features of such images become more important. However, high-resolution images are more complicated and variable than medical image patterns, and errors in terrace extraction edge identification using the U-Net network, such as boundary segmentation of terraces and flatlands, still occur. To fully utilize the semantic information of the network, we adopted a nested U-Net architecture, namely the UNet++ network proposed by Zhou et al.28. The network integrates long-connected and short-connected architectures to capture features at different levels by adding a shallower U-Net structure and integrates them via feature superposition to make the scale difference of feature maps smaller when fused to enhance the correct rate of image segmentation edges. However, because the U-Net++ network increases the number of model parameters, this study adopted the sparse matrix approach to accelerate model training and decrease the number of parameters.
Data pre-processing
Data pre-processing is a prerequisite for UNet++ network training, that is, valid input according to the standard format annotation before training can be performed. Since the UNet++ network proposed by Zhou et al.28. is primarily used for medical images, which have characteristics such as fixed image structure, no spatial information, and less pattern variation, labelling medical images is comparatively easier using this method. In contrast, high-resolution remote sensing images have a large number of rasters, many pattern changes, irregular image structure, and spatial information. Therefore, determining how to better annotate high-resolution remote sensing images and reduce the annotation workload becomes critical. First, we vectorized the training sample area and generated the terrace vector dataset using ArcGIS with a high-resolution remote sensing image as the primitive map. Second, we converted the terrace vector dataset into raster data. The information of the raster had to be identical to that of the primitive map, including the size of the raster, its processing range, and its coordinate system. The output was converted to TIFF format to complete the image annotation. Since the raster size input to UNet++ network training is a fixed size, it is much smaller than the original image. To simplify the process of inputting the original image and its annotation information, we added an image import module to DLTEM, which was a sliding window of 400*400, and read the image automatically by setting the corresponding judgement conditions. Finally, the entire high-resolution image was processed automatically into the model in accordance with the established rules for training.
The goal of the data enhancement was to improve the universality and robustness of the UNet++ network training results. As mentioned above, the high-resolution images taken simultaneously often included clouds or other anomalies in some areas, as the images were stitched together using multiple sources of data fusion. This can easily form evident stitching traces (Fig. 1c,d) due to the different shooting times and image quality of various data sources, i.e., brightness, saturation, and colour contrast of the images. Thus, the model trained on the original image data has strong limitations, and in many scenes, there are notable matrix-type misclassification regions due to image differences, making extraction work challenging. Therefore, in this study, we first adjusted the brightness, grayscale, and contrast of the training data after input to enhance its colour feature recognition ability. We then altered the scaling of the image, and rotated and transformed the training image from 0° to 360° to enhance morphological feature recognition and the accuracy of the training network in terrace extraction.
Parameter setting
The network parameter setting is the most critical hyperparameter for UNet++ network training. They are mainly divided into input image size, batch size, learning rate, number of iterations, objective function, gradient descent strategy, momentum, decay rate, and activation function. Among them, we set the image size to 400*400 pixels based on the actual situation of the terraced area, where the UNet++ network has four scaling times, and the image size must be a multiple of 16. The batch size primarily affects the convergence of the model. If the batch limit is set to one, the model is easily affected by the random perturbation phenomenon and cannot converge to find the optimal solution. Since the batch size is determined by the size of the video memory, the value of the batch is limited by equipment constraints. The model in this study used a 2080Ti video card with 11 GB of video memory, and the batch was set to 8. The learning rate, gradient descent strategy, and objective function play a role in whether the network can find the best classification model better and faster. The learning rate was set to 0.001 for the first 500 generations, with the goal of achieving fast convergence to the target region. The learning rate was then set to 0.0001 for 500–1,000 generations, and the model was fine-tuned by choosing a smaller learning rate to find the model with the highest classification accuracy. Adam was chosen for the gradient descent strategy. The momentum and adaptive learning rate were used to increase the convergence rate. The cross-entropy classification loss function was chosen as the objective function to improve the differentiation between terraced and non-terraced areas. Momentum, decay rate, and activation function were all adopted from the previous default settings of the UNet++ network.
Data correction
In this study, we primarily used high-resolution images from Google Earth as the data source to extract the distribution of terraces on the Loess Plateau. Because this image source only contains a large amount of texture structure information and no vegetation information, it is easy to misjudge and misclassify features with the same morphological structure and edge features, such as permanent snow and ice, water bodies, bare land, and artificial surfaces. Vegetation information was generally processed based on waveband data from multispectral/hyperspectral images. It requires topographic correction, atmospheric correction, radiometric calibration, de-clouding, and other operational processes, which are extremely sophisticated30.
GlobeLand30 is a 30 m spatial resolution global surface coverage dataset developed by the National Geomatics Center of China. The most recent GlobeLand30 dataset (v2020) has been updated with data sources from 2017 to the present. Its extensive data sources enable effective reduction of the impacts of cloud cover, with an overall accuracy of 85.72%. The classification accuracy of permanent snow and ice, water bodies, bare land, and artificial surfaces of this dataset is as high as 75.79%, 84.70%, 81.76%, and 86.70%, respectively. Since the update time of v2020 data is similar to that of high-resolution images, it can be used as correction data for vegetation information31.
Since the training image data are two-dimensional planar data with no elevation or slope information (Fig. 1g), certain flat fields with visible field bumps are easily misclassified as terraces. The Space Shuttle Radar Topography Mission (SRTM v4.1) DEM has a spatial resolution of 30 m and ranges from 60° N to 56° S, completely covering the Loess Plateau32,33. In this study, these data were treated as terrain correction data. The amendment standard corrects the areas that have been extracted as terraces below 2° to non-terraced areas according to the requirements of the Ministry of Natural Resources of China.
The spatial resolution of our extracted terraces is 1.89 m, whereas the spatial resolution of GlobeLand30 and DEM as correction data sources is 30 m, which is difficult to meet the requirements of data processing. Hence, we up-sample the two correction data sources, and then used multi-source data fusion. First, we extracted and up-sampled the terraced areas of glaciers, rivers, and deserts from GlobeLand30 to a spatial resolution of 1.89 m. Secondly, we up-sampled the DEM to 1.89 m using spatial interpolation for its raster centre as the true value of the region and performed a slope calculation for the up-sampled DEM. Further, the spatial distribution maps of glaciers, rivers, deserts, and slope maps of the Loess Plateau with the same resolution as the spatial distribution maps of terraces were available. Finally, we superimposed these images, used the terrace range in the TDMLP as a mask, and assessed the pixels in the mask area one by one. If a pixel belonged to permanent snow and ice, a water body, bare land, or an artificial surface, or had a slope less than 2°, it was modified to the background value. Otherwise, the original value was retained.
We made artificial corrections to the data based on the extracted results for the arid areas of the Loess Plateau as well as for the flatter basins, given that these areas do not feature terraces.
Training and validation data
For supervised classification, the selection of sample areas and sample features is crucial. The focus and core of any land classification work is representative and effective training sample selection. To obtain a better sample area selection, we considered the selection of sample areas from three perspectives, i.e., colour texture features, topographic features, and spatial distance of the training samples. First, the terraces in this study are in agricultural land, including cultivated land, woodland, grassland, and other types of land; thus, different types of land will present different texture details. At the same time, high-resolution images from Google Earth are mosaicked. Because of the different acquisition times, the same region and land type will have visible colour differences and stitching traces, which is more common in the Loess Plateau region. Therefore, these factors should be considered in the selection of training samples as much as possible to improve the generability of the model and the correct rate of its extraction. Second, the state of the terraces varies according to topographic features. Among them, gradient, direction, altitude, and climate are the most significant factors. Terraces can be categorised as shallow-slope or steep-slope terraces. Based on slope aspect, altitude, and climate characteristics, they can also be categorised as either easy to identify or hard to identify. Thus, the sample should be inclusive of these types of terraces. According to the first law of geography, terraces in different spatial locations have different morphologies. Therefore, the spatial location of the samples should also be at a certain distance.
In summary, we selected one county in each region based on the geomorphic zoning characteristics of the Loess Plateau. In addition, we added one more in the area where the density of terraces may be higher. Finally, we selected the whole area of seven counties (Fig. 3) as the training sample area distribution, covering 2.18% of the overall Loess Plateau area. The colour morphological features, topographic features, spatial location, and imaging quality of terrace images in these regions are highly representative. This method was unique from other classification methods. Most of the traditional methods are based on the single-pixel information of feature layers such as random forests, which tend to ignore the neighbouring information around the point, and thus are subject to misclassification and under classification for land types with outstanding texture information. In our study, we adopted the visual interpretation of the whole domain, which can cover the neighbourhood information of each pixel point more comprehensively. To ensure the uniformity and correctness of visual interpretation, the terraces in the training area were visually interpreted by seven interpreters after uniform professional training. For the disputed and uncertain areas, the seven interpreters carried out interactive interpretation and scoring according to the interpretation results. Finally, two other interpretation experts made the final review and corrections. The interpretation results of the training area were re-examined and revised based on the results of the later interpretations.
To better assess and compare the validity and correctness of the terraced agricultural area datasets on the Loess Plateau in quantitatively, the validation dataset was divided into two parts: a per-pixel point-based validation set and a field validation dataset of terraces with location information. The extracted datasets were comprehensively evaluated in terms of both pixel scale and field validation.
We constructed a single-pixel validation point that evaluates the TDMLP. We applied the Icosahedral Snyder Equal Area Discrete Global Grid created by ArcGIS. Based on this strategy, the study area was partitioned into 972 regions (Fig. 3). To better validate the terrace classification results (excluding non-terrace classes), we placed more validation points within the grid where the terrace distribution is more concentrated. First, we calculated the proportion of terraces in each hexagonal grid to the total area of the hexagonal grid. Second, we separated the terraces into four levels according to the proportion of terraces to the whole grid area as 0–20%, 20–50%, 50–80%, and 80–100% and the number of validation points was 10, 20, 40, and 50, respectively.
Since the proportion of the extracted terraced area to the total area was only 14%, direct random point deployment would have led to fewer terraced validation sets and thus would have affected the final data evaluation. Therefore, in the deployment strategy, we ensured that the validation points distributed in the extracted terraces in each grid account for at least one-fifth of the total number of validation points, but for the grid with a smaller proportion of terraces or even 0, this practice was meaningless. Hence, we stipulated that in the grid with a proportion of terraces ≤1%, direct random scattering was to be performed. The final scattered verification points in the terraced and non-terraced areas were 5,194 and 6,226, respectively, with a ratio close to 1:1 for easy verification. The spatial distribution is shown in Fig. 3.
We validated the spatial distribution map of terraces on the Loess Plateau from 14 April 2021 to 1 May 2021 and constructed a field validation dataset of terraces with location information. Considering the longitudinal, latitudinal, and vertical heterogeneities of the Loess Plateau, the verification route was divided into two sections, north to south and east to west, to more comprehensively cover all regions of the Loess Plateau. The verification route started at Hohhot in the northeast of the Loess Plateau. It passed through the Datong Basin, followed the Yellow River to the south and the Weihe Plain, and then travelled westward through Mount Liupan to the westernmost part of the Loess Plateau. The route was through 54 counties/districts in 16 cities and six provinces on the Loess Plateau, with a total distance of 3,680 km, covering 15.8% of the counties on the Loess Plateau (total of 341 counties). We also surveyed and sampled the verification points approximately every 5 km along the route and collected data from a total of 815 sample points, covering various types of terraces on the Loess Plateau. The results are shown in Fig. 3.
Source: Ecology - nature.com