ReaLSAT, a global dataset of reservoir and lake surface area variations

In this section, we provide quantitative evaluation for both spatial coverage and temporal dynamics of ReaLSAT dataset.

Spatial coverage

Since the dataset was created using satellite imagery analysis, it can provide more comprehensive coverage than existing datasets. However, using an automated process also has its challenges. It can invariably lead to the detection of spurious waterbodies because of issues in data (e.g., due to errors in GSW maps used as inputs in ReaLSAT).

To provide more insights into the types of lakes and potential issues in the spatial coverage of ReaLSAT, we randomly sampled 5,000 lakes out of 435,717 that are only present in ReaLSAT (i.e., not available in the HydroLAKES dataset). A human annotator used Google’s satellite imagery base layer to categorize these lakes. Figure 5a shows the geographical distribution of these lakes, and Fig. 5b shows the distribution of different lake types in the sample set. Out of the 5,000 lakes, the human annotator identified 2,019 traditional lakes and reservoirs where sufficient water was visible in the satellite imagery. Another 551 lakes in the sample set showed signs of a bowl-like depression but with no (or very little) water visible in the satellite imagery and were labeled as ephemeral. There were 861 other lakes that were tagged as farm ponds because they showed geometric patterns of farming in the imagery. This diversity of waterbody types discovered by ReaLSAT that were previously unreported by HydroLakes highlights one of the strengths of our approach. In limnology, the origin/type of lake is a very important regulator of ecosystem dynamics. For instance, reservoirs will have faster water flow/lower residence time than natural lakes, and therefore nutrient and carbon processing rates will differ; floodplain lakes may dry periodically, leading to the denudation of sediments; and farm ponds will likely have much higher rates of nutrient loading and methane production than non-agriculturally influenced lakes. Hence, capturing a more comprehensive range of waterbody categories can enable various scientific studies where knowing the origin/lake type could provide a critical understanding of the process.

Fig. 5

(a) Geographic location of 5000 randomly selected lakes used for manual evaluation of lake type. (b) Allocation of the 5000 manually referenced lakes to specific lake types. Regular implies a traditional lake or reservoir. Unverifiable implies that the lake type could not be identified based on the available Google Earth imagery.

Full size image

Along with the lentic water types discovered in the sampled set, we also found that ReaLSAT identified 603 river segments missed by our morphological score filter. As stated earlier, this is an inherent challenge with automated approaches that use a fixed score threshold for eliminating river segments. Another 239 lakes were tagged as wetlands because of significant vegetation inside and around the lake polygon. There were also 97 lakes that were adjacent to rivers, which were labeled as riverine or floodplain lakes that were formed as a result of river channels meandering over time. Furthermore, there were 59 lakes where the polygons represented only a small portion of a larger lake and were labeled as partial. Finally, for 571 polygons, there was not enough evidence to tag them in any of the above categories. Since Google imagery represents only a single snapshot in time, these 571 waterbodies could not be definitively labeled as spurious (hence, they were labeled as unverifiable), highlighting a limitation of this evaluation pipeline. In particular, a vast majority of these waterbodies appear to be ephemeral based on their surface area timeseries (completely dry for extended periods of time). Hence, if the satellite imagery layer is from one of these timesteps, the annotator would not be able to confirm the presence of the lake.

To assess whether we would obtain a similar distribution of different waterbody categories in existing datasets, we performed a similar evaluation on another 5,000 lakes sampled from ReaLSAT where each polygon has some overlap (greater than 1 pixel) with a polygon from HydroLAKES. In this sampled set, the annotator identified 4,030 lakes as traditional lakes or reservoirs, 370 as ephemeral, 138 as farm ponds, 6 as river segments, 66 as wetlands, 95 as riverine or floodplain lakes, 20 as partial, and 275 as unverifiable.

Compared to previous distribution, this set of 5,000 waterbodies contains relatively fewer river segments and wetlands polygons in HydroLAKES, because these categories were manually identified and removed during HydroLAKES database creation⁶. Similary, this set contains relatively few farm ponds because HydroLAKES was created by manual curation of existing static databases and hence does not contain new farm ponds that got created over the years.

Temporal dynamics

To assess the quality of surface extent maps, we performed a quantitative evaluation on a random selection of extent maps. These extent maps were compared against reference maps created by a human annotator using a semi-automated pixel classification procedure. This strategy of creating reference maps is used extensively in the remote sensing literature (e.g. see^36,37,38,39). Next, we describe our evaluation process in detail.

Sample selection

There are 462,574 lakes out of 681,137 total lakes where the label updates (corrections and imputations) by the ORBIT approach have trust scores within our chosen thresholds (as described in the methods section). To evaluate these candidate lakes effectively, we focus on lake extent maps where the ORBIT approach resulted in a different map than the underlying GSW extent based map. Hence, we remove maps where no updates were made by the ORBIT approach (neither corrections nor imputations) from the candidate pool of extent maps used for evaluation. We also remove maps where the percentage of missing labels was more than 90% because these maps tend to suffer from significant cloud coverage. Hence, it would be challenging to generate reference maps. Since the GSW dataset has a significant amount of missing data for most places in the world before 2000, we evaluated maps only from 2000 onwards. These three filters left us with a total of 51,077,278 water extent maps considered for selection. Figure 6a shows the distribution of percentage pixels updated made by the ORBIT approach in these water extent maps. To evaluate the robustness of our approach in comparison to GSW maps, we randomly selected 10,000 water extent maps such that extents with significant updates are given higher weight to reduce the skew in distribution towards extents with relative less updates (Fig. 6b).

Fig. 6

Distribution of updates made by the ORBIT approach. (a) distribution using candidate water extents (b) distribution using randomly selected 10000 water extent maps for evaluation.

Full size image

Sample pruning

From these randomly selected water extent maps, we removed maps for which a reference map could not be generated due to clouds or the inability of the annotator to distinguish between land and water. A final set of 2,095 water extent maps were considered for evaluation. Figure 7a shows the distribution of percentage updates in the final set of evaluation extents and Fig. 7b shows the geographical distribution of these extent maps.

Fig. 7

Summary of the dataset used for evaluating water extent maps. (a) Distribution of updates made by the ORBIT approach in the water extent maps selected for evaluation. (b) Geographical location of the lakes in the evaluation set.

Full size image

Reference map generation

For these water extent maps, we created ground truth reference maps using a semi-automatic labeling process^37,38,39. Specifically, the annotator selects land and water samples to train an SVM (Support Vector Machine) classification model for each image. The annotator keeps adding samples until a stable map is generated. As a final step, the annotator masks out pixels affected by clouds, cloud shadows, and any other region where the annotator is not confident about the accuracy of the reference labels. This process enables a quick and robust generation of reference maps. Supplementary Fig. S7 shows one of the reference maps in the evaluation set. While this strategy of comparing maps is different from the traditional approach of comparing pixels (often selected using stratified sampling), it provides a much more exhaustive evaluation of surface extent maps. The reference maps used for evaluation in this study are also available as part of the dataset.

Comparison

To compare the extent maps generated by ReaLSAT with the reference maps, we used accuracy as the evaluation metric, a widely used metric to measure the quality of classification maps. Accuracy is simply defined as the ratio of pixels with correct labels over a total number of pixels. Specifically, we assign 1 to water pixels and 0 to land pixels. Since GSW based extent maps contain missing labels, they are assigned a value of 0.5 to reflect the uncertainty between land and water. Accuracy is then calculated as follows:

$$Accuracy=1-frac{1}{Rast C}mathop{sum }limits_{i=1}^{R}mathop{sum }limits_{j=1}^{C}left|ReferenceMap[i,j]-PredictedMap[i,j]right|$$

(2)

where, R is the number of rows and C is the number of columns of the map.

When the accuracy of RealSAT and GSW labels are compared, a vast majority of points lie above the diagonal 1:1 line, which implies that ReaLSAT labels were more accurate overall (Fig. 8a). In Fig. 8 the points are colored based on % of pixels where GSW labels were missing. To better show the improvement in RealSAT labeling, we plot the distribution of the difference in accuracy values between the two datasets as shown in Fig. 8b. A positive value indicates that the surface extent map from the ReaLSAT dataset had better accuracy than the map from the GSW dataset and vice versa. For ease of visualization, we plot this distribution after excluding cases where the accuracy from both datasets was equal. The positively skewed distribution demonstrates the efficacy of the ORBIT approach.

Fig. 8

Comparison of accuracy values using GSW labels vs ReaLSAT labels. (a) Scatter plot of accuracy values using GSW labels vs ReaLSAT labels. (b) Histogram of difference in accuracy between ReaLSAT labels vs GSW labels. Positive value represents cases where ReaLSAT labels were more accurate than GSW labels. (c) Histogram of difference in accuracy values for the scenario where pixels labelled as land by both products as well as ground truth were removed to reduce the skew of surrounding land pixel on the accuracy values.

Full size image

Note that the shape of a lake will influence the number of land pixels surrounding it, which might bias the accuracy values. For example, the reference map shown in Supplementary Fig. S7 contains more than 70% of land pixels. To address this bias, we also calculated accuracy values after removing pixels that were labeled as land by both datasets as well as the ground truth. This variation allows a more strict evaluation of water extent maps. Figure 8c shows the distribution of the difference in accuracy values under this scenario (after excluding cases with equal accuracy). As shown, a vast majority of the distribution is still towards positive values. Furthermore, the distribution has a larger spread towards high positive values, suggesting significant improvement made by the ORBIT approach.

From Fig. 8, we can see that for some cases ReaLSAT based extent maps are less accurate relative to GSW. As described earlier, violation of assumptions made by the ORBIT approach could lead to the observed poor performance. Out of 2,095 extent maps, GSW labels show better accuracy than ReaLSAT for 323 of them. On visual analysis of errors in these maps, we found that 165 maps are slightly different only at the lake’s boundary. We categorized the remaining extent maps based on the reason behind the observed poor performance. In particular, 45 maps have poor performance due to occlusion of water surface by algae, 18 maps contain farm ponds, 8 contain mining lakes, 27 maps have unreliable bathymetry, 30 maps have issues due to the weighting factor used by ORBIT approach, and 30 maps have class conditional missing data. All the reference maps and corresponding maps from GSW and ReaLSAT are provided with the dataset.

Next, we describe some of these cases in detail.

Impact of algae: It can be difficult to visually differentiate surface algae or floating aquatic plants from terrestrial vegetation⁴⁰, as they have similar reflectance spectra. Therefore, surface algal blooms often get incorrectly labeled as land in the reference maps. However, in most cases, the appearance and disappearance of algae on a lake are independent of the bathymetry. Thus, algae pixels get detected as physically inconsistent by the ORBIT approach, and consequently, these pixels are updated based on the labels of other pixels without algae. In many cases, while the accuracy with respect to the reference map is poor (because algae get labeled as land), ReaLSAT based extent maps are closer to the true extent of the lake. For example, Supplementary Fig. S8 illustrates the impact of algae on the extent mapping of Center Lake, Texas. In this example, the bimodal distribution of fraction values (either low or high) reveals high confidence in lake persistence (Supplementary Fig. S8b). On Oct 22, 2008, false-color composite processing of LANDSAT-5 imagery reveals a strong vegetative signal on the west side of the lake (Supplementary Fig. S8c). Since we know that this is a lake, we can assume that the west side of the lake is experiencing a large surface algal bloom with a similar reflectance to the surrounding terrestrial landscape. Because of the strong vegetative reflectance signal, the semi-automated reference mapping labels the west side of the lake as land (Supplementary Fig. S8d), as does most GSW labels (Supplementary Fig. S8e). Conversely, the ReaLSAT extent map labels the west side of the lake as water (Supplementary Fig. S8f). However, we calculate accuracy based on the semi-automated reference map (Supplementary Fig. S8d). Due to this, the GSW extent map is considered more accurate than the ReaLSAT map, even though this is not true because the reference map is incorrectly labeled. Therefore, some negative accuracy values may be a misrepresentation of reality due to surface algal blooms.

Impact of variable bathymetry: Even though we tried to remove lakes with unreliable bathymetry by using score-based filters defined in an earlier section, not all cases were removed. For example, agricultural ponds often have small sections that are connected and change shape based on agricultural needs. Supplementary Fig. S9 highlights an example of labeling issues on agricultural ponds in Mexico. In this area, satellite imagery and the GSW fraction map confirm the presence of agricultural ponds (Supplementary Fig. S9a,b). These individual ponds are filled and drained based on operational decisions and do not follow a consistent pattern of growing or shrinking. Thus, the ORBIT approach can introduce spurious updates in water extent maps for these farms. In the Landsat-5 imagery from 2009–10–08, some of the ponds are dry, while others are filled (Supplementary Fig. S9c). This distribution of water is evident from a visual inspection and is confirmed in the semi-automated reference map (Supplementary Fig. S9d). Due to the similar elevations between the individual pond sections, the ORBIT approach spuriously fills the remaining sections with water based on the incorrectly learned bathymetry (Supplementary Fig. S9f). While quantification of such uncertainties is outside the scope of this paper, we hope that the wider research community can use RealSAT to address such questions. In particular, changes in bathymetry of a lake can be identified using spatial-temporal patterns in the label corrections. Specifically, if the elevation of some pixels in a lake increases after a certain time (e.g., sediment deposits leading to increase the elevation of a pixel), they will appear as physically inconsistent to the ORBIT framework, and hence the labels for these locations will be changed from land to water much more frequently after this increase in elevation.

Impact of bias in errors and missing data: As mentioned earlier in the methods section, based on our observation, the confidence of water labels is higher than land labels in the GSW dataset. To account for this bias, we used a weighting factor of 3 for the water class. While this weighting factor improves the ORBIT approach’s performance in most cases, this assumption leads to an overestimation of water for some lakes. For example, Supplementary Fig. S10 compares the water extent maps with and without the weighting factor for a small reservoir in eastern Brazil. As we can see, the GSW labels contain false positives, and due to the weighting factor of 3, ORBIT prefers to update the land labels to water which further increases the number of false positives, as shown in Supplementary Fig. S10e. However, if we use a weighting factor of 1 for this example, the ORBIT approach can effectively remove many of the false positives in the GSW map, as shown in Supplementary Fig. S10f.

Similarly, apart from missing data due to clouds in the GSW dataset, there can also be missing values on pixels where the GSW classification model is not confident. Hence, for some water extent maps, class-dependent missing data (compared to missing data which is class independent) adversely impact the ORBIT approach. For example, Supplementary Fig. S11 shows a water extent map for Zhongleng Reservoir in China, where missing data along the eastern edges is not independent but has resulted from ambiguous pixels around the lake where the GSW’s approach was not confident. In such a scenario, the ORBIT approach heavily relies on information from nearby timesteps to infer labels for missing pixels, leading to errors in ReaLSAT maps if there is a significant variation in lake extent in nearby timesteps, as shown in Supplementary Fig. S11e.

ReaLSAT, a global dataset of reservoir and lake surface area variations

Spatial coverage

Temporal dynamics

Sample selection

Sample pruning

Reference map generation

Comparison

Natural and anthropogenic factors drive large-scale freshwater fish invasions

High source–sink ratio at and after sink capacity formation promotes green stem disorder in soybean

ITALIAN LANGUAGE

ENGLISH LANGUAGE