More stories

  • in

    High-frequency monitoring reveals a CO2 source-sink shift in a subtropical eutrophic urban lake

    AbstractEutrophic urban lakes with CO2-supersaturation represent potential carbon (C) sources; however, the drivers behind the reported C-source–sink shift remain poorly understood. This study provides a systematic assessment of daytime/seasonal pCO2 and fCO2 dynamics in a subtropical moderately eutrophic urban lake (Bailuwan, China), based on over a year of high-frequency monitoring, aiming to clarify the mechanisms regulating CO2 exchange at the water–air interface in such ecosystems. Our work revealed consistent daytime declines in pCO2 (and fCO2) on 12 sampling days, though morning–afternoon differences were not significant (n = 24). Novel episodic undersaturation events were newly observed in October 2020 and March 2021, contrasting with the prevailing supersaturation. Annual mean values (n = 48) reached 1789 µatm (pCO2) and 130 mmol m−2 h−1 (fCO2). Critically, we identified a pronounced semi-annual divergence: pCO2 from January to June significantly exceeded values from July to November. Both periods maintained a net source status (> 420 µatm), lacking the typical spring-sink/summer-source transition reported in previous studies. Key regulators, such as pH, chlorophyll a, and dissolved oxygen, influence C-sink-source dynamics, with eutrophication further modulating these shifts. These original findings highlight the need for targeted strategies to reduce pollutants and enhance carbon sequestration in urban lakes.

    Similar content being viewed by others

    Seasonal and diurnal changes of pCO2 in the lower Brahmaputra River, Bangladesh

    Article
    Open access
    21 November 2024

    Emerging climate impact on carbon sinks in a consolidated carbon budget

    Article

    12 November 2025

    Freeze-thaw seasonal variations and environmental controls of CO2 and CH4 diffusive emissions from reservoirs in the upper Yellow River

    Article
    Open access
    03 October 2025

    IntroductionFreshwater ecosystems, encompassing lakes, rivers, and wetlands, are pivotal in the biogeochemical dynamics of greenhouse gases (GHGs), with impacts extending from local to planetary scales1,2,3,4,5,6. In particular, inland lakes, despite their limited spatial coverage (~ 3.7% of non-glacial terrestrial surfaces)7, are key players in carbon cycling through sequestration, transport, and transformation mechanisms8,9,10. Empirical studies have consistently demonstrated that approximately 90% of global inland lakes exhibit carbon dioxide (CO2) supersaturation relative to atmospheric equilibrium11,12. Global estimates suggest that CO2 emissions from lakes range from 0.07 to 0.15 Pg C year-1, with upper estimates reaching as high as 0.57 Pg C year-113,14,15. With a spatial extent representing 6.2% of global lake coverage (25°N–54°N), lakes in China release roughly 15.98 Tg C year− 1 in the form of atmospheric CO2 emissions16,17,18. Unexpectedly, shallow lakes with extensive surface areas but depths of less than 3 m have been identified as critical hot-spots for CO2 emissions owing to their high biogeochemical activity and efficient gas exchange19,20. The consistent pattern of CO2 supersaturation in these aquatic environments, with CO2 levels exceeding the atmospheric equilibrium range of 380−420 µatm, provides clear evidence of their role as net CO2 sources to the atmosphere21,22,23,24.From a global spatiotemporal perspective, the CO2 emitted into the atmosphere is rebalanced through biogeochemical mechanisms such as photosynthetic assimilation, sediment burial, and oceanic uptake, collectively contributing to the dynamic equilibrium of the global carbon cycle25,26. From 2007 onward, the oceans have sequestered about 56% of human-induced carbon emissions, resulting in detectable ocean acidification (pH), with the residual 44% accumulating in the atmosphere3,5,27,28. Growing empirical evidence documents the conversion of particular lake ecosystems from net carbon sources to net sinks, with some currently exhibiting transitional carbon dynamics29,30,31. Systematic monitoring data presented by Xiao et al.32 showed a significant downward trend in CO2 outgassing rates from Chinese lakes between the 1980s and 2000s, pointing to their gradual transformation from C-releasing to C-sequestering ecosystems. This transformation may be attributed to synergistic interactions among biogeochemical processes (e.g., enhanced C-sequestration and organic matter burial), shifting environmental conditions (e.g., nutrient loading and hydrological regimes), and targeted anthropogenic interventions (e.g., ecological restoration and eutrophication control)25,33,34,35,36. Consequently, systematic documentation of C-source-sink dynamics and their regulatory controls in lacustrine ecosystems is imperative to refine predictive models of C-exchange and quantify their contributions to regional C-budgets under evolving climatic and anthropogenic pressures.Watershed urbanization generates synergistic perturbations to aquatic ecosystem processes, wherein combined effects of anthropogenic nutrient loading and riparian habitat fragmentation lead to fundamental alterations in carbon transformation pathways and greenhouse gas exchange in urban water systems37,38. Urban lakes, as highly sensitive freshwater ecosystems, are particularly susceptible to algal blooms, resulting in eutrophication35,39,40,41. In this study, an urban lake is defined as a lentic water body situated entirely within a metropolitan area, whose hydrological processes, water quality, and ecological functions are primarily governed by anthropogenic activities15,41,56,89. Key characteristics include: (i) heavily modified hydrology through water level control and engineered shorelines; (ii) significant nutrient inputs from urban runoff and wastewater; (iii) altered ecological communities due to habitat modification and recreational use; and (iv) serving dual roles in both receiving urban discharges and providing ecosystem services such as flood mitigation and recreation25,26,42. Comparative studies reveal dramatic differences in carbon emissions between two Indian lakes: the hypereutrophic Belandur Lake exhibits exceptionally high CO2 efflux rates (5711 ± 844 Tg C year− 1), while Jakkur Lake, currently under ecological rehabilitation, demonstrates substantially lower emissions (24 ± 10 Tg C year− 1)42. While empirical evidence suggests that eutrophication could exert a modest positive influence on CO2 sequestration in non-urban lakes29,36, its role and mechanistic pathways in urban lakes remain inadequately elucidated. Consequently, to advance our comprehension of the underlying mechanisms regulating CO2 exchange in urban lakes, particularly shallow lakes, it is imperative to conduct more extensive field measurements and systematic analyses.Inland water CO2 fluxes (fCO2) are predominantly controlled by the interplay between aqueous CO2 partial pressure (pCO2) and the rate of gas transfer (kCO2) across the water-air boundary in freshwater ecosystems17,43,44. The pCO2 parameter, recognized as a pivotal determinant in deciphering carbon cycle variability across urban water bodies at multiple spatial scales11,45, demonstrates multifactorial regulation through: (i) environmental drivers (e.g., solar irradiance)4,9,10, (ii) biogeochemical processes (particularly aquatic metabolism involving photosynthesis-respiration/P-R coupling)46, (iii) hydrological dynamics (including thermal stratification and mixing regimes)47,48,49, and (iv) allochthonous carbon inputs from catchment areas50,51,52,53. Correspondingly, kCO2 variability exhibits primary dependence on wind shear stress and thermal conditions54. Distinct from the relative homogeneity of atmospheric pCO2, lacustrine pCO2 manifests pronounced spatiotemporal heterogeneity across daytime, monthly, and seasonal scales, with system-specific characteristics strongly influenced by morphometric parameters and trophic status4,5,6,55,56. Empirical evidence from Lake Ulansuhai, a shallow urban waterbody, reveals eutrophication-induced functional shifts from CO2 source to sink conditions57. Moreover, rapid urban expansion around lacustrine environments introduces substantial complexity to carbon cycling processes, with notable impacts on CO2 emission patterns5,58,59. Despite these insights, critical knowledge gaps persist regarding high temporal resolution characterization of pCO2 dynamics in urban shallow lakes, necessitating systematic investigations into urbanization-CO2 emission synergies through integrated observational and modeling approaches.Building upon the aforementioned backgrounds, this work aims to address a central scientific question: how and to what extent do eutrophic urban lakes modulate the dynamics of CO2 exchange across the water–air interface under high-frequency monitoring? To resolve the gap, we implemented a comprehensive study assessing daytime CO2 patterns in a shallow, subtropical urban lake with elevated nutrient levels in southwest China. This work pursues three principal aims: (i) performing monthly diurnal monitoring of pCO2/fCO2 and associated physicochemical variables between October 2020 and November 2021; (ii) identifying key hydrological and environmental controls on aquatic carbon cycling; and (iii) evaluating the premise that subtropical eutrophic lakes demonstrate alternating C-source-sink behavior under high-frequency investigation. The findings of this work are expected to provide novel insights into the mechanisms governing CO2 exchange in lacustrine systems and to refine the quantification of CO2 emissions from urban lakes, thereby reducing uncertainties in regional/global C-budget assessments.ResultsVariations in pCO2
    Throughout our investigation period (07:00–18:00 CST), sustained/significant (p > .05) daytime pCO2 decreases were observed during nine sampling days, with the exception of three dates (February 28, March 30, and November 22, 2021) that exhibited gradual increasing trends (n = 1; Fig. 1A–L). Considering individual field investigation days, the declining pattern remained predominant across most sampling occasions, except those illustrated in Fig. 1D, E, I, J, L. The composite analysis of four temporal sampling points revealed a statistically remarkedly (p < 0.05) reduction in mean pCO2 levels (n = 12) between 11:00 and 18:00 CST (Fig. 2M; Table 1). Relative to the 07:00 CST reference (2043.91 ± 2033.26 µatm), pCO2 exhibited pronounced (p < 0.05) daytime fluctuations as 42.63% at 11:00 CST, followed by successive reductions of 49.18% (14:00 CST) and 43.42% (18:00 CST). Interestingly, non-significant (p > 0.05) diurnal variation was detected between morning (07:00–11:00 CST; 2479.56 ± 3971.80 µatm) and afternoon periods (14:00–18:00 CST; 1097.66 ± 803.29 µatm) when analyzing the combined dataset (n = 24; Fig. 2A).Fig. 1Hourly variations of pCO2 (solid line) and fCO2 (dashed line) in the studied lake. Hollow circles denote individual measurements (n = 1; Fig. 2A–L) and mean values (n = 12; Fig. 2M–N). Field measurements originally scheduled for July 2021 were conducted on August 2 due to meteorological constraints.Full size imageFig. 2Diurnal comparison of mean pCO2 (A), fCO2 (B), Twater (C), pH (D), Chla (E) and DO (F) between morning (07:00–11:00 CST) and afternoon (14:00–18:00 CST) periods (n = 24). Distinct lowercase letters denote statistically significant differences (p < 0.05) between temporal periods. The vertical line on the bars represents the mean ± standard deviations (SDs).Full size imageTable 1 Temporal correlation analysis of daily pCO2 and fCO2 in Bailuwan lake (n = 4). x, time; y1, pCO2; y2, fCO2; r2, regression coefficients.Full size tableMonthly analysis revealed unexpected temporal patterns, with peak pCO2 concentrations (n = 4; 8148.02 ± 1882.12 µatm) occurring in October 2021, contrasting sharply with the minimum values recorded in October 2020 (198.14 ± 65.20 µatm). Throughout the annual cycle (January–November 2021), mean pCO2 levels during the first half of the year (January-June; n = 28; 1239.34 ± 781.00 µatm) generally lowered (p > 0.05) those of the latter half (July-November; n = 16; 3143.15 ± 4673.69 µatm). However, January and March exhibited the lower mean concentrations (510.20 ± 101.94 µatm and 314.92 ± 145.48 µatm, respectively; Fig. 3A). The overall daytime mean pCO2 across all measurements was 1788.61 ± 2919.44 µatm (n = 48).Fig. 3Monthly variations of pCO2 (A) and fCO2 (B) at the water–air interface in Bailuwan Lake. Upper panel displays median (black line) and mean (red line, n = 4), with whiskers spanning the range between [Q1 − 1.5×IQR] (lower bound) and [Q3 + 1.5×IQR] (upper bound). Q1: first quartile; Q3: third quartile; IQR: interquartile range (Q3 − Q1). Oct.0 and Dec.0 denote October and December 2020 sampling campaigns, respectively; remaining data were obtained in 2021. The vertical line on the bars represents the mean ± standard deviations (SDs).Full size imageAlterations in fCO2
    Similar to the pCO2 patterns, the daytime mean fCO2 concentrations (n = 12) exhibited a characteristic pattern of initial increase significantly (p < 0.05) followed by subsequent decrease. Specifically, compared to the baseline measurement at 07:00 CST (139.16 ± 175.86 mmol m− 2 h− 1), fCO2 concentrations showed a 76.13% increase by 11:00 CST, followed by significant reductions of 55.93% and 45.78% at 14:00 and 18:00 CST, respectively. Analysis of individual sampling day (n = 1) revealed consistent and statistically significant (p < 0.05) daytime fCO2 decreases across nine sampling dates, with the exception of February 28, March 30, and November 22, 2020 (Fig. 1A–L). Further, comparative analysis between morning (07:00–11:00 CST; 192.14 ± 418.09 mmol m− 2 h− 1) and afternoon periods (14:00–18:00 CST; 68.39 ± 76.65 mmol m− 2 h− 1) demonstrated non-significant (p > 0.05) diurnal variation in mean fCO2 concentrations (n = 24; Fig. 2B).Monthly analysis revealed distinct temporal patterns, with negative daytime mean fCO2 values (n = 4) recorded in October 2020 and March 2021, contrasting sharply with the peak concentration observed in October 2021 (n = 4; 775.16 ± 873.01 mmol m− 2 h− 1; Fig. 3B). Remarkably, the calculated mean fCO2 concentration was 130.26 ± 303.85 mmol m− 2 h− 1 (n = 48) across the entire study period.Classification of trophic stateThe Carlson’s trophic state index (TSI) was quantitatively assessed through the established methodology in Method S1, incorporating key limnological parameters including total phosphorus (TP), total nitrogen (TN), water transparency (TPC), chlorophyll a (Chla), and chemical oxygen demand (CODMn) as detailed in Table S1 and Fig. 4. The comprehensive TSI(∑) analysis yielded a mean value of 63.04 (Table 2), categorizing the lake within the moderately-eutrophic classification according to the standard limnological criteria. Temporal analysis of TSI(∑) dynamics revealed sustained values exceeding the eutrophication threshold of 60 across the majority of sampling intervals. However, notable deviations (p < 0.05) were observed during the spring sampling campaigns of March and April 2021, during which TSI(∑) values fell below this critical benchmark, as documented in Table S2.Fig. 4Temporal variations in water quality parameters of Bailuwan Lake: TPC (A,B), Chla (C,D), TN (E,F), TP (G,H), and CODMn (I,J). Additional details are provided in Fig. 3.Full size imageTable 2 Comprehensive TSI of Bailuwan lake (n = 48). The table presents NT, TP, CODMn, total dissolved nitrogen (TDN) and eutrophication evaluation criteria, as detailed in methods S2–S7.Full size tableEnvironmental parametersRegarding diurnal variations, water quality parameters including aqueous temperature (Twater), TPC, dissolved oxygen (DO), chloride ion (Cl⁻), nitrate ion (NO3⁻), and sulfate ion (SO42⁻) exhibited a progressive increase from 07:00 to 18:00 CST (Figs. 4, 5 and 6). Surprisingly, DO concentrations demonstrated a pronounced surge of 57.33% over this period, escalating from baseline levels at 07:00 CST to maximum values observed at 18:00 CST. This trend was corroborated by comparative analyses, which revealed significantly (p < 0.05) elevated afternoon DO levels relative to morning measurements (Fig. 2F). Conversely, non-statistically significant (p > 0.05) diurnal variations were detected in Twater, pH, or Chla concentrations between morning and afternoon sampling intervals (Figs. 2C–E; p > 0.05).Fig. 5Temporal variations in physicochemical parameters of Bailuwan Lake: Twater (A,B), pH (C,D), FNU (E,F), EC (G,H), DO (I,J), and TOC (K,L). Left panels (A,C,E,G,I,K): red lines indicate mean values per sampling time (n = 12); and right panels (B,D,F,H,J,L): black dots denote monthly means (n = 4) with whiskers representing mean ± SDs. Additional details are provided in Fig. 3.Full size imageFig. 6Temporal variations in anionic species of Bailuwan Lake: F− (A,B), Cl− (C,D), NO3− (E,F) and SO42− (G,H). Additional details are provided in Fig. 3.Full size imageMonthly monitoring revealed a shared temporal trajectory among water quality indicators including TPC, turbidity (FNU), electrical conductivity (EC), anion concentrations (F−, Cl−, and SO42−), and metal levels (potassium/K, sodium/Na, chromium/Cr) all followed an initial ascending phase followed by measurable decreases. Conversely, Chla and TN demonstrated an initial decrease followed by an increase. The parameters pH and NO3− showed a gradual decline, while magnesium/Mg displayed a consistent upward trend (Figs. 4, 5, 6 and 7). Additionally, our analysis of monthly variations in CO32− and HCO3− levels revealed distinct patterns. The CO32− generally exhibited an initial decrease followed by an increase, with a notable surge observed in March 2021. In contrast, the HCO3− reached the lowest value in July 2021 (Fig. S1).Fig. 7Monthly changes in aquatic metals of Bailuwan Lake: K (A), Na (B), Mg (C), Cu (D), Zn (E), Fe (F), Mn (G), and Cr (H). Additional details are provided in Fig. 3.Full size imageStatistical analysis demonstrated strong inverse relationships of pCO2 with pH and DO (p < 0.01, Table 3), contrasted by a direct positive association with Chla (p < 0.05, Table 4). These correlations facilitated the development of predictive linear models linking pCO2 to the key parameters (pH, DO, Chla, solar radiation/SR, Twater, and CODMn), as visualized in Figure S2.Table 3 Correlation analysis of pCO2/fCO2 with water quality parameters in the studied lakes. * and ** denote significant correlations at the 0.05 and 0.01 levels (two–tailed test), respectively. Twater, water temperature; EC, electrical conductivity; TPC, transparency; FNU, turbidity; DO, dissolved oxygen; SR, solar radiation.Full size tableTable 4 Correlation analysis of pCO2/fCO2 with nutrient status indices in the studied lake. Chla, chlorophyll a; TOC, total organic carbon; F−, fluoride; Cl−, chloride; NO3−, nitrate; SO42−, sulfate; TN, total nitrogen; TP, total phosphorus; CODMn, permanganate index. Additional details are provided in Table 3.Full size tableDiscussionNotable shiftting patterns of C-sink-source in the investigated lakeAccording to recent data released by the United Nations, the global urbanization rate has increased from 30% in 1950 to 56% in 2020, and is projected to reach 68% by 205060,61. In China, more than 60% of the permanent population had achieved urbanization by the end of 201922,38. Previous studies have demonstrated that urban lacustrine systems receive substantial inputs of exogenous labile organic carbon derived from diverse anthropogenic activities, which significantly enhances heterotrophic metabolism in these urbanized aquatic ecosystems, consequently resulting in elevated CO2 emissions. These mechanistic insights explain why metropolitan water bodies have been consistently documented42,62,63,64 as focal points for carbon release within anthropogenic landscapes. Consequently, investigating the mechanisms by which urban lakes respond to urbanization is of significant importance for predicting urban greenhouse gas emissions, particularly CO2.Across all multi-year sampling days, the pCO2 in the morning hours (07:00–11:00 CST; 2480 ± 3972 µatm; n = 22) was numerically higher than that in the afternoon periods (14:00–18:00 CST; 1098 ± 803 µatm; n = 22; Figs. 1 and 2), but the difference was not statistically significant (p > 0.05). Conversely, afternoon DO concentrations demonstrated a significant elevation compared to morning values (p < 0.05; Fig. 3F), whereas Chla levels remained stable without observable daytime fluctuations (p > 0.05; Figs. 3 and 4). These patterns align with prior work in local eutrophic lakes4,5,6. Increased solar radiation after 07:00 CST spurs photosynthesis (P); later, declining light toward 18:00 CST shifts the balance toward respiration (R). The resultant drop in dissolved oxygen is documented in Table 1 and Fig. S3. Noteworthy, temperature-driven pCO2 fluctuations demonstrated no direct significance65 (Fig. 2C). Parallel observations by Potter and Xu in a subtropical North American lake revealed pronounced diurnal pCO2 dynamics, with predawn peaks and evening troughs58. Intriguingly, nocturnal CO2 efflux rates nearly tripled daytime values, underscoring distinct day-night emission patterns. In our dataset, anomalous pCO2 behavior was observed on February 28, March 30, and November 22, 2021, where morning pCO2 (07:00–11:00 CST) slightly decreased compared to afternoon levels (Fig. 1). Measurements from February 28 and November 21 consistently demonstrated pCO2 supersaturation, with all recorded values surpassing the characteristic atmospheric CO2 range (380–420 µatm)66,67. In contrast, significantly lower pCO2 levels were recorded on March 30, 2021 (198 ± 65 µatm) and October 30, 2020 (315 ± 145 µatm; n = 4; Fig. 1), reflecting dynamic C-sink-source transitions. The underlying mechanism may be attributed to extreme precipitation events acting as a trigger. On one hand, rainfall directly reduced pCO2 through the dilution effect. On the other hand, it introduced allochthonous nutrients and enhanced water column mixing, thereby stimulating intense algal blooms. The resulting high level of photosynthetic activity served as the key biological driver leading to significantly decreased pCO2. Furthermore, as an urban wetland, its hydrology is likely influenced by anthropogenic regulation. The supplemental inflow of low-CO2 external water (e.g., reclaimed water) during this period may have further reinforced and amplified the decline in pCO26,15,68.In our study, the overall diurnal mean pCO2 level in the investigated lake (n = 48; 1789 ± 2919 µatm) was significantly higher than typical equilibrium CO2 thresholds (p < 0.05) but markedly lower than global lacustrine average pCO2 of 3230 µatm69. This finding indicates net CO2 supersaturation within the lake during our study, where in-lake CO2 production exceeded consumption, driving a net efflux of CO2 to the atmosphere and thus classifying the lake as a C-source31,44,70,71. These results align with prior investigations of eutrophic urban lakes, including Beihu Lake (~ 960 µatm)6 in the same metropolitan region, as well as Capitol Lake (~ 736 µatm)9 and University Lake (~ 630 µatm)48 in Louisiana, USA. However, the observed pCO2 levels were significantly elevated compared to our earlier findings in the same lake from January to September 2020 (~ 707 µatm)4. Importantly, pCO2 in October 2021 reached an exceptionally high value of 8148 ± 1882 µatm (n = 4; Fig. 3A), a period not captured in our prior study in 20204. However, a prior two-year comparative study of CO2 fluxes across different habitats in Lake Võrtsjärv also revealed pronounced spatial, seasonal, and interannual variability72. This anomaly underscores the potential for pronounced seasonal variability in water-air interface pCO2 dynamics, likely influenced by temporal environmental drivers48,55,59.Previous work by Wang et al., investigating 43 eutrophic lakes across China’s climatic zones56, revealed pronounced seasonal variability in pCO2 across all studied systems, with lower mean values in summer and autumn, a pattern consistent with most lacustrine studies11,12. This seasonality likely stems from synergistic effects of increased phytoplankton and submerged macrophyte biomass, coupled with thermal stratification, pH dynamics, solar radiation, and anthropogenic activities4,9,73,74. Specifically, pCO2 fluctuations in aquatic systems are governed by the balance between biological production (P; positive correlation with Chla in Table 4) and respiration (R; negative correlation with DO in Table 3)21,75. Thus, we reason that increased primary productivity during warm seasons could facilitate the transition of eutrophic lakes from net CO2 sources to sinks, whereas microbial and/or photochemical mineralization of organic carbon (e.g., TOC in Fig. 5) during cooler seasons may surpass photosynthetic CO2 uptake (also known as the biological C-pump-effect)76, thereby driving seasonal source-sink shifts6,77. Specifically, intense photosynthesis during warm seasons significantly consumes dissolved CO2, lowering pCO2 below atmospheric equilibrium and leading to CO2 influx; the lake functions as net autotrophic (i.e., an atmospheric C-sink) when gross primary production (GPP) exceeds the carbon released through ecosystem respiration. Moreovoer, our year-round monitoring (January–November 2021; Fig. 3A) showed lower mean pCO2 in the first half-year (1239 ± 781 µatm, January–June) compared to the latter period (3143 ± 4674 µatm, July–November), though both phases exceeded atmospheric equilibrium, confirming persistent CO2 supersaturation. Intriguingly, episodic CO2 undersaturation occurred in October 2020 (winter for 198 ± 65 µatm) and March 2021 (spring for 315 ± 145 µatm), temporarily converting the system to a net sink (Fig. 3). Collectively, these findings highlight diurnal/seasonal source-sink transitions in subtropical urban eutrophic lakes, modulated by dynamic biogeochemical drivers.Drivers of CO2 uptake and release in response to environmental conditionsFurther correlation analyses revealed statistically significant negative relationships (p < 0.05) between pCO2/fCO2 and both pH (− 0.727**/−0.681**) and DO (− 0.311*), alongside significant positive correlations with Chla (0.295* and 0.287*, respectively; Tables 3 and 4, and Figs. 5, S2). These findings underscore that CO2 uptake/emission dynamics in urban lakes are governed by multifaceted controls from aquatic environmental factors. In this moderately eutrophic autotrophic lake system, we posit that biological drivers, particularly Chla (as P) and DO (as R), play pivotal roles in modulating CO2 fluxes78. The enrichment of nutrients intensifies the coupling between CO2 fluxes and biogeochemical cycling, as demonstrated by strong correlations with both biological indicators (e.g., Chla) and chemical factors (e.g., TP, TN and pH; Table 2, and Fig. 4), rendering the carbon dynamics more responsive to environmental changes15,18. Specifically, enhanced organic matter mineralization increases the bioavailability of TP, thereby stimulating CO2 emissions14,49,79. The mechanistic linkage aligns with the observed inverse correlation between DO and pCO2 in our study.Prior studies have demonstrated that pH regulates the physicochemical environment of lakes by mediating the dynamic equilibrium and spatial distribution of carbonate species (CO2, CO32−, and HCO3−), thereby influencing CO2 fluxes (quantified as pCO2 and fCO2) at the water-air interface4,80,81. This chemical control is particularly pronounced in lakes with elevated pH (> 8), where aqueous CO2 concentrations exhibit marked sensitivity to alkaline conditions12. Mechanistically, higher pH promotes the conversion of free aqueous CO2 into carbonate ions, reducing pCO2 and creating undersaturation that enhances atmospheric CO2 absorption. Conversely, lower pH destabilizes dissolved inorganic carbon species, driving CO2 efflux to the atmosphere82. In our study, pH values ranged from 6.5 to 9.0, displaying distinct seasonal variability: maximal fluctuations occurred in autumn-winter (August–March), while minimal variability was observed in spring-summer (April–July; Fig. 5D). This pattern may reflect temperature-mediated modulation of hydrogen ion activity83, which is related to pH, even in the absence of statistical significance (R = − 0.095 for Twater/pH in Table 3), a finding consistent with our earlier observations4. Notably, CO2 flux exhibits strong pH dependence across diverse lacustrine systems. For instance, global analyses of 196 saline lakes revealed that lakes with pH ≥ 9 typically function as weak CO2 sinks84, while a 14-year study of six hardwater lakes in Canada’s Northern Great Plains identified a critical pH threshold of 8.6 for source-to-sink transitions85. In our dataset, seasonal shifts between C-source-sink suggest the existence of a comparable pH threshold governing flux reversals, though its precise value requires further investigation.Chla, a principal photosynthetic pigment in algae and phytoplankton, serves as a critical proxy for freshwater lake productivity79. Empirical studies have demonstrated that correlations between Chla concentrations and greenhouse gas emissions reflect the metabolic equilibrium of lacustrine ecosystems15,86,87, particularly in urban lakes influenced by industrial and domestic wastewater discharges88. In our investigation of a moderately eutrophic lake during non-bloom conditions, significant positive correlations were observed between Chla and both pCO2 (R = 0.295*) and fCO2 (R = 0.287*; Table 4), aligning with our previous findings from Beihu Lake in the same urban system6. However, earlier studies on the lake failed to detect statistically significant associations (pCO2/Chla with R = 0.202; fCO2/Chla with R = 0.213)6. Such inconsistent patterns have been documented in other lacustrine systems. Interestingly, Xu and Xu reported substantial spatiotemporal variability in Chla concentrations in University Lake89, a small urban waterbody in the southern United States, yet subsequent CO2 investigations at the same site revealed no significant Chla-CO2 relationships48. Recent analyses of 33 small artificial lakes by Wang et al.64 further demonstrate this complexity: spring algal blooms in over half of these systems resulted in annual mean Chla concentrations exceeding 10 µg L−1, coinciding with enhanced CO2 sequestration. These paradoxical findings likely stem from multiple interacting mechanisms, including vertical and lateral chemical transport dynamics in open aquatic systems, phytoplankton community composition and density variations, and methodological challenges in modeling Chla-CO2 interactions6,86. Among those, solar radiation exerts a critical regulatory influence on Chla concentrations in urban lacustrine systems through its modulation of phytoplankton photosynthetic activity89,90. Diurnally, photosynthetic efficiency follows a parabolic trajectory: morning radiation intensification stimulates photosynthetic activation, reaching maximum capacity at solar noon before subsequent photoinhibition reduces both photosynthetic performance and Chla levels through late afternoon (ref., Fig. 2E)91,92. Seasonally, spring radiation intensification creates optimal photothermal conditions for phytoplankton proliferation, enhancing lacustrine carbon sequestration potential. Conversely, summer hyper-radiation events coupled with elevated water temperatures may induce thermal stress responses, potentially triggering CO2 re-release mechanisms that could transition these aquatic systems from C-sink to -source58,93.DO, serving as a critical indicator of aquatic metabolic activity, is regulated by multiple environmental drivers including water temperature, organic matter, biological photosynthesis and water dynamics. Its dynamic equilibrium reflects compensatory atmospheric exchange and respiratory consumption processes91,94,95. In moderately eutrophic lakes (Chla < 30 µg L−1), algal growth exhibits relative moderation compared to hyper-eutrophic systems, where biogeochemical processes (particularly N–P interactions involving NO3− and NH4⁺) dominate over Chla in regulating fCO2 dynamics15,96. This regulatory dominance is particularly pronounced in urban lacustrine systems along anthropogenic disturbance gradients, where significant linear correlations between nutrient concentrations and fCO2 variations have been documented12,97. Empirical evidence from Taihu Lake demonstrates that anthropogenic nutrient loading elevates CO2 emissions through enhanced aquatic respiration40. While theoretical models suggest nutrient enrichment could reduce CO2 emissions via boosted primary production, most field observations indicate that N and P inputs predominantly amplify CO2 release through stimulation of heterotrophic respiration in both water column and sediments17,35,42,98. Our findings reveal a significant negative correlation between DO and pCO2 (p < 0.05; Table 3), indicating that, in addition to biological factors (where elevated productivity reduces CO2 efflux, leading to higher pH, increased oxygen, and lower pCO2), anthropogenically mediated heterotrophic respiration may also serve as a driver of dissolved CO2 supersaturation68,99. Further, Zhang et al. propose that DO (or apparent oxygen utilization, AOU) may serve as a more direct predictor of CO2 variability than Chla in moderately eutrophic lakes during non-bloom periods, demonstrating greater independent explanatory power for fCO2 fluctuations at the water-air interface15.Moreover, in shallow lakes where hydrodynamics are the dominant force, the movement and mixing of water constitute the core physical mechanism regulating the oxygen (O2) budget6. The continuous inflow of riverine water imparts kinetic energy, and the resulting turbulence significantly enhances gas exchange efficiency at the water-air interface, thereby promoting O2 input9. Concurrently, such hydraulic disturbance effectively disrupts thermal stratification, leading the lake toward a fully mixed state50. This process transports O2-rich surface waters to the lake bottom, preventing the formation of hypoxic conditions in the benthic zone. Further, hydrodynamics govern a relatively short hydraulic retention time, which not only exports partially decomposed organic matter18,27, reducing the internal O2 demand, but also continually replenishes nutrients to support moderate levels of photosynthetic O2 production. Therefore, through these three synergistic mechanisms, enhancing reaeration, optimizing vertical distribution, and reducing net consumption, intense hydrodynamic processes positively maintain the high-O2 equilibrium in shallow lakes.Uncertainties in CO2 evasion from subtropical lake systems and future workOur synthesis demonstrates that CO2 exchange rates in anthropogenically-impacted subtropical lakes vary considerably (− 15 to 130 mmol m−2 h−1; Table 5), reflecting strong geographic and temporal dependencies in emission patterns. The mean fCO2 in our study markedly exceeded values reported for urban-dominated lakes in other regions, even surpassing previous findings by Yang et al.4 for the same lake system (29 ± 67 mmol m−2 h−1). However, our results align with the previous observations in Qinglonghu Lake (a moderately eutrophic urban lake in the same region; 108 ± 101 mmol m⁻2 h−1)10, likely attributable to intensified anthropogenic pollution inputs and aggravated eutrophication in our study lakes during the monitoring period. Notably, fCO2 values from the highly urbanized tropical Rio Grande Reservoir in South America95, i.e., 5.14 and 3.18 mmol m−2 h−1 for hypereutrophic and moderately eutrophic zones, respectively, were substantially lower than our measurements for the moderately eutrophic in this lake. These discrepancies may reflect not only differences in eutrophication status and nutrient levels but also hydrological seasonality, as exemplified by the previous work on subtropical University Lake in Louisiana, USA48. While local anthropogenic forces are often the dominant driver of CO2 dynamics in human-modified lakes, we hypothesize that a portion of the residual uncertainties in regional-scale CO2 emission estimates for subtropical lakes is associated with patterns of regional climate change15,31. Specifically, monsoon-driven climatic patterns characterized by high temperatures and/or heavy rainfall amplify interannual fluctuations in hydrological conditions (e.g., rainfall-evaporation balance), thereby modulating the dissolution-release equilibrium of CO2.Previously, Zhang et al.15 demonstrated that most lakes across diverse geographical regions including Taihu Lake, Lake Guadalcacín, Lake Bornos, Lake Alexandrina, and urban artificial ponds function as atmospheric CO2 sources, with annual mean fCO2 increasing alongside Chla concentrations (Table 5). These comparative findings further emphasize the variability of CO2 dynamics across interannual, seasonal, and hour scales6,58,100. For instance, seasonal analyses revealed a C-source-sink alternation pattern in subtropical urban lakes, where winter and spring (periods of low algal biomass) exhibit CO2 production and release, while elevated summer primary productivity facilitates a transition to CO2 sinks18. Contrary to this typical seasonal pattern, our study observed no strict as the summer-sink and winter/spring-source dynamic. Despite two months of pCO2 levels significantly below atmospheric equilibrium (Fig. 3), the annual mean pCO2 (1789 µatm) remained markedly supersaturated, indicating persistent CO2 emissions. Further, discrepancies emerged when comparing our with the previous findings of Yang et al.4 for the same lake system (707 µatm), suggesting that such divergence in seasonal patterns contributes to uncertainties in urban lake CO2 emission assessments. Similarly, Potter and Xu58 highlighted that while summer predominantly manifests as a C-source and winter as a sink, spring sampling may prove valuable for assessing CO2 evasion dynamics in shallow trophic lakes. As a hypothesis, early-spring likely represents a critical transitional window, our future studies accordingly will prioritize through systematic monitoring of this period.Prior studies has established that while long-term pCO2 variations (days to months) significantly influence evasion rate estimates, gas transfer velocity (k600) emerges as a critical regulator of diurnal CO2 evasion at shorter timescales (minutes to hours)100,101. Building on this understanding, both previous studies and our work observed distinct diurnal water-air CO2 gradients from dawn to dusk, yet collectively underestimated pCO2 owing to insufficient consideration of nocturnal dynamics. Through 24-h monitoring of subtropical urban lakes, previous study revealed pronounced diurnal fluctuations in pCO2 and CO2 degassing, particularly marked by nocturnal pCO2 surges58. Their findings identified 10:00 and 22:00 CST as periods of minimal deviation from daily mean pCO2, recommending optimized sampling between 09:00–11:00 CST to balance accuracy and operational feasibility. Our study validated the efficacy of sensor-based continuous monitoring in lacustrine systems. Similarly, Wang et al.55 demonstrated that daytime CO2 fluxes in Tangxun Lake (ca. 8 mmol m−2 day−1) were remarkedly lower than nocturnal fluxes (ca. 10 mmol m−2 day−1), with 11:00–12:00 measurements best approximating daily means. It could therefore be inferred that during nighttime hours, when photosynthetic activity is minimal, enhanced CO2 evasion is likely to occur. Nevertheless, only a limited number of studies have focused on nocturnal CO₂ release dynamics. For instance, analysis of long-term limnological data (1987–2006) from Lake Apopka, Florida, by Gu et al.102 revealed consistently higher average partial pressure of CO2 (224 µatm) at nighttime compared to daytime levels. Similarly, Reis and Barbosa103 reported significantly elevated mean nocturnal pCO2 (565 µatm) relative to daytime values (436 µatm) in a tropical productive lake in southeastern Brazil. More recently, Reiman and Xu104 further corroborated a consistent daytime pattern of pCO2 in the Lower Mississippi River, characterized by a peak prior to sunset and a minimum during periods of maximum solar irradiance. These collective findings underscore the critical role of temporal resolution in evasion rate quantification.Methodologically constrained by funding limitations, manpower shortages, and site accessibility challenges, our study lacked nocturnal monitoring and employed a limited sample size. While these findings provide preliminary insights into CO2 dynamics at water-air interfaces, representing an initial exploratory step. For example, there are still the following urgent challenges that need to be addressed in our current work:

    (i)

    This study conducted high-frequency monitoring at a representative site located 2 m from the lake’s shoreline. This location was selected for its capacity to reflect the dominant air-water exchange processes in the lake’s open waters while avoiding known, strong point-source disturbances. Although spatial variability of CO2 concentrations in the well-mixed central basin of this moderately sized urban lake is likely limited in the absence of major perturbations, we acknowledge that a single sampling point cannot fully capture the potential spatial heterogeneity of the entire lake. Further, its proximity to the shore may not adequately represent biogeochemical processes in the pelagic zone. For instance, nearshore areas susceptible to groundwater inflow or sediment resuspension may develop localized pCO2 hotspots, which were not directly monitored in this study design. Additionally, the presence of submerged aquatic vegetation in the sediments at the sampling site can influence dissolved CO2 concentrations and their diel fluctuations. The spatial heterogeneity of aquatic vegetation introduces uncertainty when extrapolating discrete point measurements to whole-ecosystem fCO2. While the current sampling strategy adheres to standard protocols, it may not fully represent the metabolic diversity across different habitats. In essence, the net CO2 flux in vegetated areas represents a dynamic balance between photosynthetic uptake and respiration, which varies nonlinearly with environmental conditions. Moreover, the inhibitory effect of vegetation canopies on the gas transfer velocity (k) is often overlooked in flux calculations, potentially leading to systematic overestimation in these areas. Consequently, the CO2 values reported herein should be interpreted as the best available estimate of the dominant fluxes in the lake’s nearshore zone, rather than an absolute and precise whole-lake average. Future investigation should aim to better quantify this spatial variability and its impact on the integrated lake carbon flux budget by deploying more extensive sensor networks and integrating high-resolution pCO2 mapping with habitat-specific parameterization of k.

    (ii)

    As noted previously, safety and technical constraints prevented our high-frequency monitoring from covering the nocturnal period. This may introduce uncertainty in our estimates of diel CO2 fluxes, particularly in quantifying nighttime CO2 emissions dominated by ecosystem respiration, potentially leading to an underestimation of total daily CO2 emissions. To assess this uncertainty and provide reasonable flux estimates to the extent possible, we propose that the daily mean flux can be extrapolated based on a conservative estimate of nighttime flux, for instance, by assuming that nighttime fluxes are similar to the lower flux levels observed around sunset. An analysis of the potential systematic bias introduced by the absence of nighttime data indicates that, even in seasons when the lake consistently acts as a CO2 source, and under a worst-case scenario where nighttime fluxes reach daytime peak levels, the core conclusion, that the lake may shift from a CO2 source to a sink in certain seasons (e.g., during summer algal blooms), remains valid. This is because the strong daytime photosynthetic uptake is sufficient to offset the estimated upper bound of respiratory emissions at night. Therefore, we emphasize that the key finding of this study, the observed source-sink transition dynamics of the lake, is primarily driven by strong daytime biogeochemical processes (i.e., photosynthesis vs. respiration/chemical equilibria). Although the lack of direct nighttime observations introduces a degree of uncertainty, it does not undermine our understanding of the principal mechanisms driving these dynamics. Future investigations should place greater emphasis on investigating CO2 dynamics during the nighttime.

    (iii)

    The pCO2 in this study was calculated from pH, temperature, and alkalinity using thermodynamic equilibrium equations. This approach may introduce significant deviations under conditions of dissolved organic carbon (DOC) or extreme pH, potentially leading to overestimation of pCO2. The combination of in-situ direct measurements and multi-method comparisons is necessary, mainly to reduce uncertainties and enhance data reliability. Moreover, the constraining the precise contribution of these allochthonous inputs to the lake’s CO2 emissions entails considerable uncertainties. The pulsed nature of carbon delivery during rainfall events is poorly captured by our monthly sampling, likely leading to an underestimation of episodic inputs. Further, the heterogeneous composition and bioavailability of imported carbon (e.g., labile DOC from sewage vs. refractory DOC from soil erosion) make it difficult to predict its mineralization efficiency and thus its ultimate contribution to CO2 evasion. Disentangling the effects of external carbon from in-lake processes remains a major challenge, as these drivers are often coupled (e.g., nutrient inputs stimulating productivity that consumes CO2).

    Overall, future work require standardized protocols incorporating temporal (diurnal/seasonal) and spatial (cross-regional urban lake selections) dimensions to reduce uncertainties in CO2 flux estimates for urban lacustrine systems. Given the critical role of eutrophication in modulating C-source-sink transitions in urbanized lakes, we propose implementing dual-objective environmental management strategies, such as pollution mitigation (intercepting pollutant inputs through watershed management), and C-conscious restoration including optimizing hydrophyte communities through, C-sequestration species selection, biodiversity enhancement, and spatial configuration optimization. These measures should be prioritized in subtropical regions prone to algal blooms and climatic warming, aiming to simultaneously improve water quality and align lacustrine carbon budgets with global carbon neutrality targets.Table 5 The global comparison of CO2 fluxes heterogeneity in subtropical lakes.Full size tableConclusionsThis work employed a high-frequency observational program to examine temporal and spatial patterns of pCO2 and fCO2 in relation to key environmental variables within a subtropical urban lake with moderate eutrophic status. Temporal analysis identified consistent afternoon reductions in aquatic CO2 parameters during daytime (07:00–18:00 CST), contrasting with anomalous measurements recorded on three specific dates. However, no statistically significant differences in pCO2/fCO2 levels were detected between morning and afternoon (p > 0.05, n = 24). During the investigation, the average pCO2 and fCO2 levels were measured at 1788.61 µatm and 130.26 mmol m⁻2 h⁻1 (n = 48), respectively. Monthly analyses revealed substantial variability in both parameters, with pCO2 ranging from 198.14 to 8148.02 µatm, and fCO2 from − 16.14 to 775.16 mmol m−2 h−1 (n = 4, monthly). The annual cycle (January–November 2021) showed significantly lower mean pCO2 during the first half-year (1239.34 µatm, January–June) compared to the latter period (3143.15 µatm, July–November), both exceeding atmospheric equilibrium to function as net CO2 source. Crucially, episodic CO2 undersaturation occurred in October 2020 (198.14 µatm) and March 2021 (314.92 µatm), temporarily converting the system to a carbon sink. Statistical analyses identified pH, Chla (P), and DO (R) as key environmental drivers of pCO2 and CO2 flux variability, while eutrophication status and anthropogenic disturbances critically modulated source-sink transitions. These findings highlight the urgent need for improved management strategies in urban lake systems, such as reducing pollutants and mitigating carbon emissions, supported by standardized protocols that account for temporal (especially nocturnal), seasonal, and regional variations. Such integrated approaches will enhance the accuracy of CO2 flux estimates and contribute to global carbon neutrality goals.MethodsSite descriptionThe study was conducted at Bailuwan Lake (104° 7′ 40.5″ E, 30° 34′ 56.01″ N), an urban water body situated in the peri‑urban transition zone of Chengdu, Sichuan, China (Fig. S4). This artificially established aquatic system functions as an integrated urban eco‑wetland complex under the management of the Jinjiang District Government, combining tourism with ecological conservation. Designated in 2017 as Chengdu’s first National Urban Wetland Park, the lake represents a typical urban water body in a western Chinese megacity and offers a valuable case study for examining common features and challenges, such as ecological functions, environmental pressures, and management practices, of urban lakes globally.The lake covers a total surface area of 200 ha, with open water bodies accounting for approximately 33.5% of the area and exhibiting depth gradients ranging from 0.5 to 6.5 m. According to our previous study based on 2021 data, vegetation represented the largest land cover type (56.2%) in the study area, followed by bare land, lake surface, and roads (6.8%)105. Hydrologically, the main inflow is an engineered tributary of the Dongfeng Canal system, while water export occurs mainly through evaporation and controlled discharge via constructed drainage infrastructure.The study area is situated within a humid subtropical monsoon climate zone, characterized by pronounced seasonal thermal variability. Meteorological records indicate a mean annual temperature of 16.5 °C, with a distinct seasonal pattern featuring the lowest monthly temperatures in January and peak values during July–August, aligning with the broader regional climate regime. Throughout the monitoring campaign, diurnal air temperature fluctuations in the lake vicinity were substantial, varying from 0 °C to 35 °C (Fig. S3A), reflecting strong day-night thermal dynamics. In addition, the region experiences considerable solar exposure, with an annual cumulative solar radiation measuring approximately 161 kJ cm− 2 (Fig. S3B). This high level of irradiance plays a critical role in driving both hydrological and ecological processes, underscoring the distinctive energy budget setting of this subtropical lacustrine environment.Field measurementsThis investigation employed a monthly field monitoring protocol to collect essential hydrological parameters from October 2020 through November 2021, with the exception of November 2020 and September 2021 due to logistical constraints. Additionally, the scheduled July 2021 field campaign was administratively rescheduled to August 2, 2021. In other words, a total of 12 in-situ measurements were collected over 13 months of investigation. To ensure methodological rigor and data reliability, a triplicate sampling approach (n = 3) was systematically implemented for each field collection event, with samples subsequently subjected to both in-situ measurements and comprehensive laboratory analyses.To maintain rigorous data quality standards and ensure cross-comparability, standardized sampling was performed at four fixed time points daily (07:00, 11:00, 14:00, and 18:00 CST) using a plastic grab-sampler at a depth of 30–50 cm below the water surface. All trips were made on sunny days to minimize rainfall and/or stormwater runoff effects on water conditions. Moreover, one objective of this study was to capture CO2 dynamics in the near-shore shallow water area, a typical ecotone, whose distinct characteristics are often homogenized and overlooked in whole-lake scale studies. Therefore, the sampling site was selected 2 m (Fig. S4) from the shore primarily because this location represents a sensitive zone for CO2 exchange at the water-air interface and is also significantly influenced by anthropogenic activities (e.g., surface runoff and input from riparian vegetation). Further, our investigation revealed the presence of submerged aquatic vegetation, such as Ceratophyllum demersum and Potamogeton distinctus, distributed on the lakebed directly below the sampling point.Water transparency was assessed through Secchi disk measurements (TPC; cm), employing a standardized 20-cm diameter disk, with concurrent turbidity determinations (FNU; NTU) performed using a calibrated HACH-TSS turbidimeter (Danaher Corporation, Washington, DC, USA). Concurrently, a Hanna-HI9829/HI98186 multiparameter probe (Hanna Instruments, Italy) was deployed for synchronous in situ measurements of fundamental water characteristics: pH, DO (mg L−1), EC (µS cm−1), and Twater (°C). The quantification of pCO2 required precise determination of carbonate (CO32−; CB; mol mL−1) and bicarbonate (HCO3−; BCB; mol mL−1) concentrations via acid-base titration, utilizing phenolphthalein and methyl orange as dual-endpoint indicators in accordance with the standardized analytical procedure (Method S2).Additionally, during the dynamic monitoring phase, continuous in-situ measurements of water quality parameters were conducted only after the readings had stabilized and remained consistent for an additional 5 min. Moreover, the stabilization process typically required 15–30 min. For water samples intended for TOC and anion analysis, filtration through a 0.45 μm micropore membrane filter was performed prior to storage in pre-cleaned polyethylene bottles. All samples were stored in high-density polyethylene bottles that had been pre-acid washed, tightly sealed, and carefully inspected to prevent gas exchange. During transportation, samples were placed in coolers with sufficient wet-ice to maintain preservation conditions.Furthermore, comprehensive laboratory analyses were conducted on water samples to quantify multiple physicochemical parameters. The measured parameters encompassed: (i) Chla (mg m−3) quantified per the National Environmental Protection Agency (NEPA) standards106, (ii) nitrate (NT; mg L−1) analyzed following Method S3, and (iii) TP (mg L−1) assessed via Method S4. Additionally, TOC was fractionated into total carbon (TC) and inorganic carbon (IC) components (mg L−1) using validated methodologies4,5. Additional parameters assessed were TDN (mg L− 1; Method S5), CODMn (mg L− 1; Method S6). Anionic species (F−, Cl−, SO42−, NO3−; mg L− 1) were quantified following previously validated methodologies4,5,6. The trophic status of water bodies was evaluated through the Carlson’s TSI (trophic state index; Method S1).Besides, given its location within an urban setting, the concentrations and distribution patterns of metals in Baihuwan lake could serve as sensitive tracers of anthropogenic disturbances, such as discharges from urban wastewater and surface runoff. These indicators assist in evaluating the potential influence of allochthonous carbon inputs on the overall carbon balance of the system87. For instance, the speciation and solubility of metals such as Fe and Mn are strongly dependent on ambient redox conditions45,87. Variations in their concentrations and ratios could be used to infer dominant pathways of organic matter degradation (e.g., Fe and Mn reduction processes), which are closely linked to the production and consumption of greenhouse gases including CO2. Therefore, in this study, we also measured trace metal concentrations (Cu, Zn, Cr, Fe, Mn, K, Ca, Na, Mg; µg L− 1; see Method S7).To ensure sample integrity, water specimens were collected in amber glass bottles and maintained at 4 °C during transportation. All field sampling procedures were strategically conducted during periods of meteorological stability (clear weather conditions) to eliminate potential confounding effects from precipitation or surface runoff on analytical outcomes.Laboratory analysesCalculation of pCO2
    Extensive studies has demonstrated that the equilibrium distribution of aqueous carbonate species, encompassing bicarbonate, carbonate, carbonic acid, and dissolved CO2, is principally governed by a suite of physicochemical parameters, specifically hydrogen ion activity (pH), aqueous temperature (Twater), and ionic strength (IS) of the solution4,23,107,108. Building upon this empirical foundation, the present study employs a thermodynamic carbonate speciation model (Eqs. 1–4) to computationally derive the aqueous carbon dioxide partial pressure (pCO2, expressed in µatm), with rigorous implementation protocols detailed in Method S8.$$p{text{CO}}_{{text{2}}} = left[ {{text{H}}_{{text{2}}} {text{CO}}_{{text{3}}} } right]/K_{{{text{CO2}}}} , = ,aleft( {{text{H}}^{ + } } right) cdot a({text{HCO}}_{{text{3}}} ^{ – } )/left( {K_{{{text{CO2}}}} cdot K_{{text{1}}} } right)$$
    (1)
    $$alpha left( {{text{H}}^{ + } } right){mkern 1mu} = {mkern 1mu} 10^{{ – [{text{pH}}]}}$$
    (2)
    $${{alpha (HCO}}_{3}^{-} {text{)}} = {text{[HCO}}_{3}^{-} {text{]}} times mathop {10}nolimits^{{-0.5sqrt {text{I}} }}$$
    (3)
    $$begin{gathered} I, = ,0.{text{5}}(left[ {{text{K}}^{ + } } right], + ,{text{4}}left[ {{text{Ca}}^{{{text{2}} + }} } right]{text{ }} + {text{ }}left[ {{text{Na}}^{ + } } right], + ,{text{4}}left[ {{text{Mg}}^{{{text{2}} + }} } right] hfill \ {text{ }} + {text{ }}left[ {{text{Cl}}^{ – } } right], + ,{text{4}}left[ {{text{SO}}_{{text{4}}} ^{{{text{2}} – }} } right]{text{ }} + {text{ }}left[ {{text{NO}}_{{text{3}}} ^{ – } } right] + [{text{HCO}}_{{text{3}}} ^{ – } ])/{text{1}}000000 hfill \ end{gathered}$$
    (4)
    where the terms α(H+) and α(HCO3−) represent the chemically active fractions of hydrogen [H+] and bicarbonate [HCO3−] ions, accounting for non-ideal solution effects, whereas I quantifies the cumulative electrostatic environment through ionic strength.Furthermore, it should be noted that the concentration of CO2 in the lake water was calculated from bicarbonate alkalinity, using pH and temperature as the relevant thermodynamic parameters. However, this computational approach could lead to significant overestimation of CO2 under high DOC conditions (> 200 µmol L− 1). Therefore, prior to each measurement, pre-screening was conducted to ensure that the hydrochemical conditions of the lake, including pH, alkalinity, and DOC (or TOC in this study) concentration, remained within a “safe range” for reliable estimation.Estimation of fCO2
    Empirical studies have systematically demonstrated that the interfacial CO2 flux across the aquatic-atmospheric boundary layer is predominantly regulated by a complex interplay of environmental variables, including thermal conditions (temperature), solute concentration (salinity), atmospheric turbulence (wind speed), and the pCO269,109. In accordance with these established principles, we implemented a mechanistic transfer model (Eq. 2) to quantify the net fCO2 at the water-air interface, with flux density expressed in standardized units of mmol m− 2 h− 1, following the comprehensive methodological framework in Method S9.$$f {text{CO}}_{{text{2}}} , = ,K_{{text{T}}} K_{{text{H}}} left[ {p{text{CO}}_{{{text{2}}({text{water}})}} , – ,p{text{CO}}_{{{text{2}}({text{air}})}} } right]$$
    (5)
    where fCO2 quantifies the net exchange rate of CO2 per unit area at the water-air boundary interface. The parameters KH and KT correspond to the temperature-dependent Henry’s law constant (quantifying CO2 solubility) and the gas transfer velocity (characterizing the water-air exchange coefficient), respectively. The determination of KH, following established thermodynamic principles, incorporates a multivariate dependence on physicochemical conditions, specifically thermal regime (temperature), ionic strength (salinity), and hydrostatic pressure, as comprehensively characterized in the seminal work of Weiss110.$$mathop Knolimits_{H} = mathop {text{e}}nolimits^{{[mathop Anolimits_{1} + mathop Anolimits_{2} (100/T) + mathop Anolimits_{3} (T/100)]}}$$
    (6)
    Moreover, the normalized dimensionless Schmidt number (K600) was computationally transformed to the CO2-specific gas transfer velocity (KT) using the functional relationship expressed in Eq. (7). This transformation incorporates the well-established functional dependence between the Schmidt number (Sc) and gas exchange kinetics, thereby facilitating the precise estimation of KT across a range of environmental conditions, as originally demonstrated in the foundational work of Jahne et al.111$$:{text{}text{K}}_{text{T}}text{=}{text{K}}_{text{600}}times(frac{text{600}}{{text{Sc}}_{text{CO}text{2}}}text{)}{text{}}^{text{n}}text{}$$
    (7)
    where the exponent of the Schmidt number, denoted as n, exhibits variability contingent upon wind speed conditions. The exponent n adopts a value of 0.5 for wind speeds > 3.7 m s− 1, decreasing to 0.75 for calmer conditions (< 3.7 m s− 1), as established by Guérin et al.112 For the Schmidt number parameterization, we applied the widely accepted value of 0.67 following Cole and Caraco’s experimental determinations under reference conditions107. Furthermore, the computational algorithm for K600, as expressed in Eq. (8), accompanied by its comprehensive methodological elucidation, is presented in Method S9. In this study, U10 denotes wind speed values corrected to the conventional 10-m reference height over the water body at sampling time.$$K_{{{text{6}}00}} , = ,{text{2}}.0{text{7}}, + ,0.{text{215}}U_{{{text{1}}0}} ^{{{text{1}}.{text{7}}}}$$
    (8)
    Statistical analysisIn the present study, the statistical framework employed the IBM-SPSS Statistics 22 (IBM Corp., USA) for comprehensive data analysis, adopting a 95% confidence level (α = 0.05) to ensure analytical robustness. High-quality data visualization was achieved using SigmaPlot 14.0 (Systat Software Inc., USA), enabling precise graphical interpretation. This dual-platform approach complies with contemporary standards for quantitative research methodology, guaranteeing both statistical validity and visual clarity.

    Data availability

    All data generated or analysed during this study are included in this published article and its supplementary information files.
    ReferencesSaunois, M. et al. The global methane budget 2000–2017. Earth Syst. Sci. Data. 12, 1561–1623 (2020).Article 
    ADS 

    Google Scholar 
    Li, X., Shi, F., Ma, Y., Zhao, S. & Wei, J. Significant winter CO2 uptake by saline lakes on the Qinghai–Tibet plateau. Glob. Change Biol. 28, 2041–2052 (2022).Article 
    ADS 

    Google Scholar 
    Sun, H. et al. Drivers of Spatial and seasonal variations of CO2 and CH4 fluxes at the sediment water interface in a shallow eutrophic lake. Water Res. 222, 118916 (2022).Article 
    CAS 
    PubMed 

    Google Scholar 
    Yang, R. et al. Significant daily CO2 source–sink interchange in an urbanizing lake in Southwest China. Water 15, 3365 (2023).Article 
    CAS 

    Google Scholar 
    Yang, R. et al. Assessing the landscape ecological health (LEH) of wetlands: research content and evaluation methods (2000–2022). Water 15, 2410 (2023).Article 

    Google Scholar 
    Yang, R. et al. The shifting pattern of CO2 source sink in a subtropical urbanizing lightly eutrophic lake. Sci. Total Environ. 946, 174376 (2024).Article 
    CAS 
    PubMed 

    Google Scholar 
    Verpoorter, C., Kutser, T., Seekell, D. A. & Tranvik, L. J. A global inventory of lakes based on high–resolution satellite imagery. Geophys. Res. Lett. 41, 6396–6402 (2014).Article 
    ADS 

    Google Scholar 
    Tranvik, L. J. et al. Lakes and reservoirs as regulators of carbon cycling and climate. Limnol. Oceanogr. 54, 2298–2314 (2009).Article 
    ADS 
    CAS 

    Google Scholar 
    Yang, R., Xu, Z., Liu, S. & Xu, Y. J. Daily pCO2 and CO2 flux variations in a subtropical mesotrophic shallow lake. Water Res. 153, 29–38 (2019).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Yang, R. et al. Daily variations in pCO2 and fCO2 in a subtropical urbanizing lake. Front. Earth Sci. 9, 805276 (2022).Article 
    ADS 

    Google Scholar 
    Cole, J. J., Caraco, N. F., Kling, G. W. & Kratz, T. K. Carbon dioxide supersaturation in the surface waters of lakes. Science 265, 1568–1570 (1994).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Wen, Z. et al. Re-estimating china’s lake CO2 flux considering Spatiotemporal variability. Environ. Sci. Ecotech. 19, 100337 (2024).CAS 

    Google Scholar 
    Holgerson, M. A. & Raymond, P. A. Large contribution to inland water CO2 and CH4 emissions from very small ponds. Nat. Geosci. 9, 222–226 (2016).Article 
    ADS 
    CAS 

    Google Scholar 
    Cole, J. J. et al. Plumbing the global carbon cycle: integrating inland waters into the terrestrial carbon Budge. Ecosystems 10, 171–184 (2007).Article 
    CAS 

    Google Scholar 
    Zhang, L., Xu, Y. J. & Li, S. Changes in CO2 concentration and degassing of eutrophic urban lakes associated with algal growth and decline. Environ. Res. 237, 117031 (2023).Article 
    CAS 
    PubMed 

    Google Scholar 
    Li, S. et al. Large greenhouse gases emissions from china’s lakes and reservoirs. Water Res. 147, 13–24 (2018).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Ran, L. et al. Substantial decrease in CO2 emissions from Chinese inland waters due to global change. Nat. Commun. 12, 1730 (2021).Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    Zhang, L., Xu, Y. J. & Li, S. Source and quality of dissolved organic matter in streams are reflective to land use/land cover, climate seasonality and pCO2. Environ Res. 216, 114608 (2023).Article 
    CAS 
    PubMed 

    Google Scholar 
    Downing, J. A. et al. The global abundance and size distribution of lakes, ponds, and impoundments. Limnol. Oceanogr. 51, 2388–2397 (2006).Article 
    ADS 

    Google Scholar 
    Lin, P. et al. Hotspots of riverine greenhouse gas (CH4, CO2, N2O) emissions from Qinghai Lake Basin on the northeast Tibetan Plateau. Sci. Total Environ. 857, 159373 (2023).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Karim, A., Dubois, K. & Veize, J. Carbon and oxygen dynamics in the Laurentian great lakes: implications for the CO2 flux from terrestrial aquatic systems to the atmosphere. Chem. Geol. 281, 133–141 (2011).Article 
    ADS 
    CAS 

    Google Scholar 
    Raymond, P. A. et al. Global carbon dioxide emissions from inland waters. Nature 503, 355–359 (2013).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Abril, G. et al. Amazon river carbon dioxide outgassing fuelled by wetlands. Nature 505, 395–398 (2014).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Wang, G., Liu, S., Sun, S. & Xia, X. Unexpected low CO2 emission from highly disturbed urban inland waters. Environ. Res. 235, 116689 (2023).Article 
    CAS 
    PubMed 

    Google Scholar 
    Foley, E. & Steinman, A. D. Urban lake water quality responses to elevated road salt concentrations. Sci. Total Environ. 905, 167139 (2023).Article 
    CAS 
    PubMed 

    Google Scholar 
    Xiao, Q. et al. Management actions mitigate the risk of carbon dioxide emissions from urban lakes. J. Environ. Manag. 344, 118626 (2023).Article 
    CAS 

    Google Scholar 
    Li, S., Luo, J., Xu, Y. J., Zhang, L. & Ye, C. Hydrological seasonality and nutrient stoichiometry control dissolved organic matter characterization in a headwater stream. Sci. Total Environ. 807, 150843 (2022).Article 
    CAS 
    PubMed 

    Google Scholar 
    Zhu, Z. et al. Land–water transport and sources of nitrogen pollution affecting the structure and function of riverine microbial communities. Environ. Sci. Technol. 57, 2726–2738 (2023).Article 
    ADS 
    PubMed 

    Google Scholar 
    Pacheco, F. S., Roland, F. & Downing, J. A. Eutrophication reverses whole-lake carbon budgets. Inland. Waters. 4, 41–48 (2013).Article 

    Google Scholar 
    Finlay, K. et al. Decrease in CO2 efflux from Northern Hardwater lakes with increasing atmospheric warming. Nature 519, 215–218 (2015).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Tao, Y. et al. Evolution of CO2 flux over 60 years: identifying source and sink changes caused by eutrophication of Hulun Lake. Sci. Total Environ. 953, 176052 (2024).Article 
    CAS 

    Google Scholar 
    Xiao, Q. et al. Lakes shifted from a carbon dioxide source to a sink over past two decades in China. Sci. Bull. 69, 1857–1861 (2024).Article 

    Google Scholar 
    Tang, W., Xu, X. J. & Li, S. Rapid urbanization effects on partial pressure and emission of CO2 in three rivers with different urban intensities. Ecol. Indic. 125, 107515 (2021).Article 
    CAS 

    Google Scholar 
    Wang, J. et al. pCO2 and CO2 evasion from two small suburban rivers: implications of the watershed urbanization process. Sci. Total Environ. 788, 147787 (2021).Article 
    CAS 
    PubMed 

    Google Scholar 
    Zhou, T. et al. Characteristics and influencing factors of CO2 emission from inland waters in China. Sci. China Earth Sci. 67, 2034–2055 (2024).Article 
    ADS 
    CAS 

    Google Scholar 
    Zhou, C. et al. Cyanobacteria decay alters CH4 and CO2 produced hotspots along vertical sediment profiles in eutrophic lakes. Water Res. 265, 122319 (2024).Article 
    CAS 
    PubMed 

    Google Scholar 
    van Bergen, T. J. H. M. et al. Seasonal and daytime variation in greenhouse gas emissions from an urban pond and its major drivers. Limnol. Oceanogr. 64, 2129–2139 (2019).Article 
    ADS 

    Google Scholar 
    Rentschler, J. et al. Global evidence of rapid urban growth in flood zones since 1985. Nature 622, 87–92 (2023).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Waajen, G., van Oosterhout, F., Douglas, G. & Lurling, M. Geo-engineering experiments in two urban ponds to control eutrophication. Water Res. 97, 69–82 (2016).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Xiao, Q. et al. Eutrophic lake Taihu as a significant CO2 source during 2000–2015. Water Res. 170, 115331 (2020).Article 
    CAS 
    PubMed 

    Google Scholar 
    Xiao, Q. T. et al. Environmental investments decreased partial pressure of CO2 in a small eutrophic urban lake: evidence from long-term measurements. Environ. Pollut. 263, 114433 (2020).Article 
    CAS 
    PubMed 

    Google Scholar 
    Pickard, A. et al. Greenhouse gas budgets of severely polluted urban lakes in India. Sci. Total Environ. 798, 149019 (2021).Article 
    CAS 
    PubMed 

    Google Scholar 
    Ran, L. et al. Seasonal and diel variability of CO2 emissions from a semiarid hard-water reservoir. J. Hydrol. 608, 127652 (2022).Article 
    CAS 

    Google Scholar 
    Zhao, F. et al. Seasonal pattern of diel variability of CO2 efflux from a large eutrophic lake. J. Hydrol. 645, 132259 (2024).Article 
    CAS 

    Google Scholar 
    Sobek, S. et al. The catchment and climate regulation of pCO2 in boreal lakes. Glob. Change Biol. 9, 630–641 (2003).Article 
    ADS 

    Google Scholar 
    Marotta, H. et al. Long-term CO2 variability in two shallow tropical lakes experiencing episodic eutrophication and acidification events. Ecosystems 13, 382–392 (2010).Article 
    CAS 

    Google Scholar 
    Kosten, S. et al. Climate–dependent CO2 emissions from lakes. Glob. Biogeochem. Cycles. 24, GB2007 (2010).Article 
    ADS 

    Google Scholar 
    Xu, Y. J., Xu, Z. & Yang, R. Rapid daily change in surface water pCO2 and CO2 evasion: a case study in a subtropical eutrophic lake in Southern USA. J. Hydrol. 570, 486–494 (2019).Article 
    ADS 
    CAS 

    Google Scholar 
    Katkov, E. & Fussmann, G. F. The effect of increasing temperature and pCO2 on experimental pelagic freshwater communities. Limnol. Oceanogr. 68, S202–S216 (2023).Article 
    ADS 

    Google Scholar 
    Pu, J. et al. Varying thermal structure controls the dynamics of CO2 emissions from a subtropical reservoir, South China. Water Res. 178, 115831 (2020).Article 
    CAS 
    PubMed 

    Google Scholar 
    Attermeyer, K. et al. Carbon dioxide fluxes increase from day to night across European streams. Commun. Earth Environ. 2, 118 (2021).Article 
    ADS 

    Google Scholar 
    Couturier, M. et al. Long–term trends in pCO2 in lake surface water following rebrowning. Geophys. Res. Lett. 49, e2022GL097973 (2022).Article 
    ADS 
    CAS 

    Google Scholar 
    Jiang, B. et al. Wetland CH4 and CO2 emissions show opposite temperature dependencies along global climate gradients. Catena. 248, 108557 (2025).Article 
    CAS 

    Google Scholar 
    MacIntyre, S. et al. Buoyancy flux, turbulence, and the gas transfer coefficient in a stratified lake. Geophys. Res. Lett. 37, L24604 (2010).Article 
    ADS 

    Google Scholar 
    Wang, X. et al. Spatial dynamics of pCO2 and CO2 emissions from eutrophic lakes. Ecol. Indic. 166, 112529 (2024).Article 
    CAS 

    Google Scholar 
    Wang, Y. et al. Diel variability of carbon dioxide concentrations and emissions in a largest urban lake, central china: insights from continuous measurements. Sci. Total Environ. 912, 168987 (2024).Article 
    CAS 
    PubMed 

    Google Scholar 
    Sun, H. et al. Eutrophication decreased CO2 but increased CH4 emissions from lake: a case study of a shallow lake Ulansuhai. Water Res. 201, 117363 (2021).Article 
    CAS 
    PubMed 

    Google Scholar 
    Potter, L. & Xu, Y. J. Can a eutrophic lake function as a carbon sink? Case study of a subtropical eutrophic lake in Southern USA. J. Hydrol. 625, 130071 (2023).Article 
    CAS 

    Google Scholar 
    Yang, P. et al. Variation characteristics and influencing mechanism of CO2 flux from lakes in the Badain Jaran desert: a case study of Yindeer lake. Ecol. Indic. 127, 107731 (2021).Article 
    CAS 

    Google Scholar 
    Li, S. Y., Bush, R. T., Ward, N. J., Sullivan, L. A. & Dong, F. Y. Air–water CO2 outgassing in the lower lakes (Alexandrina and Albert, Australia) following a millennium drought. Sci. Total Environ. 542, 453–468 (2016).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Ritchie, H., Samborska, V., Roser, M. & Urbanization Our world in data. https://ourworldindata.org/urbanization?source=content_type:react|first_level_url:article|section:main_content|button:body_link (2024).Peacock, M. et al. Small artificial waterbodies are widespread and persistent emitters of methane and carbon dioxide. Glob. Change Biol. 27, 5109–5123 (2021).Article 
    ADS 
    CAS 

    Google Scholar 
    Fan, L. et al. Spatiotemporal patterns and drivers of CH4 and CO2 fluxes from rivers and lakes in highly urbanized areas. Sci. Total Environ. 918, 170689 (2024).Article 
    CAS 
    PubMed 

    Google Scholar 
    Wang, L. et al. Utilization patterns strongly dominated the dynamics of CO2 and CH4 emissions from small artificial lakes. J. Environ. Manag. 373, 123613 (2025).Article 
    CAS 

    Google Scholar 
    Nimick, D. A., Gammons, C. H. & Parker, S. R. Diel biogeochemical processes andtheir effect on the aqueous chemistry of streams: a review. Chem. Geol. 283, 3–17 (2011).Article 
    ADS 
    CAS 

    Google Scholar 
    Shao, C. L. et al. Diurnal to annual changes in latent, sensible heat, and CO2 fluxes over a Laurentian great lake: a case study in Western lake Erie. J. Geophys. Res. Biogeosci. 120, 1587–1604 (2015).Article 
    CAS 

    Google Scholar 
    Jia, J. et al. Determining whether Qinghai–Tibet plateau waterbodies have acted like carbon sinks or sources over the past 20 years. Sci. Bull. 67, 2345–2357 (2022).Article 

    Google Scholar 
    Li, S., Luo, J., Wu, D. & Xu, Y. J. Carbon and nutrients as indictors of daily fluctuations of pCO2 and CO2 flux in a river draining a rapidly urbanizing area. Ecol. Indic. 109, 105821 (2020).Article 
    CAS 

    Google Scholar 
    Cole, J. J. & Caraco, N. F. Carbon in catchments: connecting terrestrial carbon losses with aquatic metabolism. Mar. Freshw. Res. 52, 101–110 (2001).Article 
    CAS 

    Google Scholar 
    Cole, J. J. et al. Plumbing the global carbon cycle: integrating inland waters into the terrestrial carbon budget. Ecosystems 10, 172–185 (2007).Article 

    Google Scholar 
    Spafford, L. & Risk, D. Spatiotemporal variability in lake-atmosphere net CO2 exchange in the Littoral zone of an oligotrophic lake. J. Geophys. Res. Biogeosci. 123, 1260–1276 (2018).Article 
    CAS 

    Google Scholar 
    Rõõm, E. et al. Years are not brothers: Two-year comparison of greenhouse gas fluxes in large shallow lake Võrtsjärv, Estonia. J. Hydrol. 519, 1594–1606 (2015).Article 

    Google Scholar 
    Cole, J. J., Pace, M. L., Carpenter, S. R. & Kitchell, J. F. Persistence of net heterotrophy in lakes during nutrient addition and food web manipulations. Limnol. Oceanogr. 45, 1718–1730 (2000).Article 
    ADS 

    Google Scholar 
    Morales-Pineda, M. et al. Daily, biweekly, and seasonal temporal scales of pCO2 variability in two stratified Mediterranean reservoirs. J. Geophys. Res. Biogeosci. 119, 509–520 (2014).Article 
    CAS 

    Google Scholar 
    Sabine, C. L. et al. The oceanic sink for anthropogenic CO2. Science 305, 367–371 (2004).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Lai, C. et al. Control of carbon dioxide exchange fluxes by rainfall and biological carbon pump in karst river–lake systems. Sci. Total Environ. 937, 173486 (2024).Article 
    CAS 
    PubMed 

    Google Scholar 
    Zavoruev, V. V. et al. Daily course of CO2 fluxes in the atmosphere-water system and variable fluorescence of phytoplankton during the open–water period for lake Baikal according to long-term measurements. Dokl. Earth Sci. 479, 507–510 (2018).Article 
    ADS 
    CAS 

    Google Scholar 
    Du, Q. et al. Factors controlling evaporation and the CO2 flux over an open water lake in Southwest of China on multiple Temporal scales. Int. J. Climatol. 38, 4723–4739 (2018).Article 

    Google Scholar 
    Ni, M., Li, S., Luo, J. & Lu, X. CO2 partial pressure and CO2 degassing in the Daning river of the upper Yangtze river. China J. Hydrol. 569, 483–494 (2019).Article 
    ADS 
    CAS 

    Google Scholar 
    Wilson-McNeal, A. et al. Fluctuating seawater pCO2/pH induces opposing interactions with copper toxicity for two intertidal invertebrates. Sci. Total Environ. 748, 141370 (2020).Article 
    CAS 
    PubMed 

    Google Scholar 
    Minor, E. C. & Brinkley, G. Alkalinity, pH, and pCO2 in the Laurentian great lakes: an initial view of seasonal and inter-annual trends. J. Great Lakes Res. 48, 502–511 (2022).Article 
    CAS 

    Google Scholar 
    Schrier-Uijl, A. P. et al. Release of CO2 and CH4 from lakes and drainage ditches in temperate wetlands. Biogeochemistry 102, 265–279 (2011).Article 
    CAS 

    Google Scholar 
    Wu, Y. et al. Ocean acidification enhances the growth rate of larger diatoms. Limnol. Oceanogr. 59, 1027–1034 (2014).Article 
    ADS 
    CAS 

    Google Scholar 
    Duarte, C. M. et al. CO2 emissions from saline lakes: A global estimate of a surprisingly large flux. J. Geophys. Res. Biogeosci. 113, 7 (2008).Article 

    Google Scholar 
    Finlay, K., Leavitt, P. R., Wissel, B. & Prairie, Y. T. Regulation of Spatial and Temporal variability of carbon flux in six hard–water lakes of the Northern great plains. Limnol. Oceanogr. 54, 2553–2564 (2009).Article 
    ADS 
    CAS 

    Google Scholar 
    Carvalho, A. C. O. et al. Phytoplankton strengthen CO2 uptake in the South Atlantic ocean. Prog. Oceanogr. 190, 102476 (2021).Article 

    Google Scholar 
    Li, D. et al. Metal-algae interaction contributes to the water environment heterogeneity in an urbanized river. Ecol. Indic. 139, e108875 (2022).Article 

    Google Scholar 
    Wang, P., Ma, J., Wang, X. & Tan, Q. Rising atmospheric CO2 levels result in an earlier cyanobacterial bloom-maintenance phase with higher algal biomass. Water Res. 185, 116267 (2020).Article 
    CAS 
    PubMed 

    Google Scholar 
    Xu, Z. & Xu, Y. J. Rapid field Estimation of biochemical oxygen demand in a subtropical eutrophic urban lake with chlorophyll a fluorescence. Environ. Monit. Assess. 187, 4171 (2015).Article 
    PubMed 

    Google Scholar 
    Ngochera, M. J. & Bootsma, H. A. Spatial and Temporal dynamics of pCO2 and CO2 flux in tropical lake Malawi. Limnol. Oceanogr. 65, 1594–1607 (2020).Article 
    ADS 
    CAS 

    Google Scholar 
    Weyhenmeyer, G. A. et al. Significant fraction of CO2 emissions from boreal lakes derived from hydrologic inorganic carbon inputs. Nat. Geosci. 8, 933–936 (2015).Article 
    ADS 
    CAS 

    Google Scholar 
    Zagarese, H. E. et al. Patterns of CO2 concentration and inorganic carbon limitation of phytoplankton biomass in agriculturally eutrophic lakes. Water Res. 190, 116715 (2021).Article 
    CAS 
    PubMed 

    Google Scholar 
    He, H. et al. Lake metabolic processes and their effects on the carbonate weathering CO2 sink: insights from diel variations in the hydrochemistry of a typical karst lake in SW China. Water Res. 222, 118907 (2022).Article 
    CAS 
    PubMed 

    Google Scholar 
    Rantakari, M. & Kortelainen, P. Interannual variation and Climatic regulation of the CO2 emission from large boreal lakes. Glob. Change Biol. 11, 1368–1380 (2005).Article 
    ADS 

    Google Scholar 
    Benassi, R. F. et al. Eutrophication effects on CH4 and CO2 fluxes in a highly urbanized tropical reservoir (Southeast, Brazil). Environ. Sci. Pollut. Res. 28, 42261–42274 (2021).Article 
    CAS 

    Google Scholar 
    Engel, F., Attermeyer, K. & Weyhenmeyer, G. A. A simplified approach to detect a significant carbon dioxide reduction by phytoplankton in lakes and rivers on a regional and global scale. Sci. Nat. 107, 29 (2020).Article 
    CAS 

    Google Scholar 
    Zhou, M. et al. Space-for-time substitution leads to carbon emission overestimation in eutrophic lakes. Environ. Res. 219, 115175 (2023).Article 
    CAS 
    PubMed 

    Google Scholar 
    Shi, W., Du, M., Ye, C. & Zhang, Q. Divergent effects of hydrological alteration and nutrient addition on greenhouse gas emissions in the water level fluctuation zone of the three Gorges Reservoir, China. Water Res. 201, 117308 (2021).Article 
    CAS 
    PubMed 

    Google Scholar 
    Kortelainen, P. et al. Sediment respiration and lake trophic state are important predictors of large CO2 evasion from small boreal lakes. Glob. Change Biol. 12, 1554–1567 (2006).Article 
    ADS 

    Google Scholar 
    Loken, L. C. et al. Large spatial and temporal variability of carbon dioxide and methane in a eutrophic lake. J. Geophys. Res. Biogeosci. 124, 2248–2266 (2019).Article 
    ADS 
    CAS 

    Google Scholar 
    Natchimuthu, S., Sundgren, I., Gålfalk, M., Klemedtsson, L. & Bastviken, D. Spatiotemporal variability of lake pCO2 and CO2 fluxes in a hemiboreal catchment. J. Geophys. Res. Biogeo. 122, 30–49 (2017).Article 
    CAS 

    Google Scholar 
    Gu, B., Schelske, C. L. & Coveney, M. F. Low carbon dioxide partial pressure in a productive subtropical lake. Aquat. Sci. 73, 317–330 (2011).Article 
    CAS 

    Google Scholar 
    Reis, P. C. J. & Barbosa, F. A. R. Diurnal sampling reveals significant variation in CO2 emission from a tropical productive lake. Braz. J. Biol. 74, 113–119 (2014).Article 

    Google Scholar 
    Reiman, J. H. & Xu, Y. J. Daytime variability of pCO2 and CO2 outgassing fromthe lower Mississippi river: implications for riverine CO2 outgassing Estimation. Water 11, 43 (2019).Article 
    CAS 

    Google Scholar 
    Liu, S. et al. Spatiotemporal dynamics of constructed wetland landscape patterns during rapid urbanization in Chengdu, China. Land 13, 806 (2024).Article 

    Google Scholar 
    The National Environmental Protection Agency (NEPA) & The Editorial Board of Water and Wastewater Monitoring/Analysis Methods. The Monitoring and Analysis Methods of Water and Wastewater, 4th edn (China Environmental Science, 2002).Cole, J. J. & Caraco, N. F. Atmospheric exchange of carbon dioxide in a low-wind oligotrophic lake measured by the addition of SF6. Limnol. Oceanogr 43, 647–656 (1998).Article 
    ADS 
    CAS 

    Google Scholar 
    Abril, G. et al. Technical note: large overestimation of pCO2 calculated from pH and alkalinity in acidic, organic-rich freshwaters. Biogeosciences 12, 67–78 (2015).Article 
    ADS 

    Google Scholar 
    Cai, W. J. & Wang, Y. The chemistry, flux, and sources of carbon dioxide in the estuarine waters of the Satilla and Altamaha rivers. Ga. Limnol. Oceanogr. 43, 657–668 (1998).Article 
    ADS 
    CAS 

    Google Scholar 
    Weiss, R. F. The solubility of nitrogen, oxygen and argon in water and seawater. Deep Sea Res. Oceanogr. Abstr. 17, 721–735 (1970).Article 
    ADS 
    CAS 

    Google Scholar 
    Jahne, B., Heinz, G. & Dietrich, W. Measurement of the diffusion coefficients of sparingly soluble gases in water. J. Geophys. Res. Oceans. 92, 10767–10776 (1987).Article 
    ADS 

    Google Scholar 
    Guérin, F. et al. Gas transfer velocities of CO2 and CH4 in a tropical reservoir and its river downstream. J. Mar. Syst. 66, 161–172 (2007).Article 

    Google Scholar 
    Download referencesAcknowledgementsWe sincerely thank all of assistants at Sichuan Agricultural University for their assistance, and Dr. Yijun Xu at Louisiana State University, LA, U.S.A., for their guidance in early experimental design. Sincere thanks to all of institutions for their supports in testing, and to the Management Center of Bailuwan Lake for their permission and facilitation.FundingThis work was partially supported by the Hai-Ju Program for the Introduction of High-end Talents in Sichuan Provincial Science and Technology Programs (Grant no., 2024JDHJ0017), the Project Supported by Sichuan Landscape and Recreation Research Center (Grant no., JGYQ2024011), the Provincial Innovation Training Program of Sichuan College Students (Grant no., S202510626056), and the Undergraduate Scientific Research Interest Cultivation and Entrepreneurship Training Program Projects at Sichuan Agricultural University (Grant no., 20252046).Author informationAuthors and AffiliationsCollege of Landscape Architecture, Sichuan Agricultural University, Chengdu, 611130, ChinaShiliang Liu, Yingying Chen, Rongjie Yang, Yuling Qiu, Aamir Mehmood Shah, Kezhu Lu, Xinyu Wang, Di Li, Xinhao Cao & Qibing ChenSchool of Tourism and Culture Industry, Chengdu University, Chengdu, 610106, ChinaRongjie YangGeophysical Exploration Brigade, Hubei Geological Bureau, Wuhan, 430100, ChinaDi LiKey Laboratory of Forest and Wetland Conservation in Sichuan Province, Sichuan Academy of Forestry, Chengdu, 610081, ChinaWenbao MaAuthorsShiliang LiuView author publicationsSearch author on:PubMed Google ScholarYingying ChenView author publicationsSearch author on:PubMed Google ScholarRongjie YangView author publicationsSearch author on:PubMed Google ScholarYuling QiuView author publicationsSearch author on:PubMed Google ScholarAamir Mehmood ShahView author publicationsSearch author on:PubMed Google ScholarKezhu LuView author publicationsSearch author on:PubMed Google ScholarXinyu WangView author publicationsSearch author on:PubMed Google ScholarDi LiView author publicationsSearch author on:PubMed Google ScholarWenbao MaView author publicationsSearch author on:PubMed Google ScholarXinhao CaoView author publicationsSearch author on:PubMed Google ScholarQibing ChenView author publicationsSearch author on:PubMed Google ScholarContributionsS.L.: Conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, resources, software, validation, visualization, writing–original draft, writing–review and editing. Y.C.: Data curation, formal analysis, investigation, software, validation, visualization. R.Y.: Conceptualization, data curation, formal analysis, investigation. Y.Q.: Data curation, formal analysis, investigation. A.M.S.: Writing-review and editing. K.L.: Data curation, Formal analysis, investigation. X.W.: Data curation. D.L.: Data curation, investigation. W.M.: Data curation, writing-review and editing. X.C.: Data curation. Q.C.: Conceptualization, funding acquisition, project administration, resources, supervision, validation.Corresponding authorsCorrespondence to
    Shiliang Liu or Qibing Chen.Ethics declarations

    Competing interests
    The authors declare no competing interests.

    Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary InformationBelow is the link to the electronic supplementary material.Supplementary Material 1Rights and permissions
    Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
    Reprints and permissionsAbout this articleCite this articleLiu, S., Chen, Y., Yang, R. et al. High-frequency monitoring reveals a CO2 source-sink shift in a subtropical eutrophic urban lake.
    Sci Rep 15, 43212 (2025). https://doi.org/10.1038/s41598-025-27331-zDownload citationReceived: 25 July 2025Accepted: 03 November 2025Published: 05 December 2025Version of record: 05 December 2025DOI: https://doi.org/10.1038/s41598-025-27331-zShare this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy shareable link to clipboard
    Provided by the Springer Nature SharedIt content-sharing initiative
    KeywordsCO2 source‒sink
    pCO2/fCO2
    Eutrophic urban lakeWater–air CO2 exchangeHigh-frequency monitoring More

  • in

    Constructing lithium-exclusive pathways in LiFePO4 electrodes

    Selective shielding of iron d-orbitals directs the crystal growth of lithium iron phosphate nanosheets to expose only the lithium-selective [100] facet, enabling highly selective, efficient and scalable lithium extraction from low-grade brines.

    Access through your institution

    Buy or subscribe

    This is a preview of subscription content, access via your institution

    Access options

    Access through your institution

    Subscribe to this journal

    Receive 12 digital issues and online access to articles

    $119.00 per year
    only $9.92 per issue

    Learn more

    Rent or buy this article
    Prices vary by article type
    from$1.95
    to$39.95

    Learn more

    Prices may be subject to local taxes which are calculated during checkout

    Additional access options:

    Log in

    Learn about institutional subscriptions

    Read our FAQs

    Contact customer support

    Fig. 1: [100]-only LiFePO4 nanosheets enable selective lithium extraction.

    ReferencesDuChanois, R. M. et al. Nat. Water 1, 37–46 (2023).Article 
    CAS 

    Google Scholar 
    Yang, S., Wang, Y., Pan, H., He, P. & Zhou, H. Nature 636, 309–321 (2024).Article 
    CAS 
    PubMed 

    Google Scholar 
    Yong, M. et al. Nat. Sustain. 7, 1662–1671 (2024).Article 

    Google Scholar 
    Xu, L. et al. Nat. Chem. Eng. 2, 551–567 (2025).Article 

    Google Scholar 
    Srimuk, P., Su, X., Yoon, J., Aurbach, D. & Presser, V. Nat. Rev. Mater. 5, 517–538 (2020).Article 
    CAS 

    Google Scholar 
    Sun, K. et al. Environ. Sci. Technol. 58, 3997–4007 (2024).Article 
    CAS 
    PubMed 

    Google Scholar 
    An, S. et al. Nat. Water https://doi.org/10.1038/s44221-025-00533-5 (2025).Article 

    Google Scholar 
    Download referencesAuthor informationAuthors and AffiliationsUQ Dow Centre for Sustainable Engineering Innovation, School of Chemical Engineering, The University of Queensland, St Lucia, Queensland, AustraliaMing Yong & Xiwang ZhangARC Centre of Excellence for Green Electrochemical Transformation of Carbon Dioxide (GETCO2), Brisbane, Queensland, AustraliaMing Yong & Xiwang ZhangAuthorsMing YongView author publicationsSearch author on:PubMed Google ScholarXiwang ZhangView author publicationsSearch author on:PubMed Google ScholarCorresponding authorCorrespondence to
    Xiwang Zhang.Ethics declarations

    Competing interests
    The authors declare no competing interests.

    Rights and permissionsReprints and permissionsAbout this articleCite this articleYong, M., Zhang, X. Constructing lithium-exclusive pathways in LiFePO4 electrodes.
    Nat Water (2025). https://doi.org/10.1038/s44221-025-00550-4Download citationPublished: 03 December 2025Version of record: 03 December 2025DOI: https://doi.org/10.1038/s44221-025-00550-4Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy shareable link to clipboard
    Provided by the Springer Nature SharedIt content-sharing initiative More

  • in

    Cities are embracing nature for flood defence

    San Jose was flooded in 2017 when storm water in Coyote Creek was funnelled into the city.Credit: Noah Berger/AFP via GettyCoyote Creek’s waters rose fast in February 2017 amid a series of storms. Rainfall filled the small river in northern California, which runs 103 kilometres from its headwaters near Morgan Hill to San Francisco Bay. In San Jose, where the river was forced into a channel tightly constrained by development, water surged from the creek. The resulting flooding forced 14,000 people to evacuate and caused more than US$73 million of damage.In the wake of the disaster, environmental activists opposed to San Jose’s sprawl into Coyote Valley on the city’s southern edge saw an opportunity. Their plan was to convince San Jose officials that the city could avoid worse flooding by choosing not to pave over some of the last open space in the watershed.Nature Outlook: CitiesTheir approach has solid scientific roots. When land is paved, rain cannot soak into the soil, increasing the risk of flooding. One study1 found that, for every 1% increase in the area of roads, pavements and car parks, the annual flood magnitude in nearby waterways increases by 3.3%.A growing understanding of this connection has led many cities to start de-paving small areas, digging and planting bioswales to absorb storm-water run-off, offering incentives for green roofs, and levying higher taxes on properties with a lot of impervious surface area. But the proposal by San Jose’s environmentalists was different. Their aim was not to make the city itself more permeable to water, but to reduce the risk of flooding by taking action upstream in the watershed, beyond the city’s urban footprint.The 2017 flood provided the impetus to act. In the following year, San Jose’s voters approved a bond for about half of the $93 million required to buy 380 hectares of North Coyote Valley. “I don’t think we would have considered putting it on the ballot, or even considered that flooding was an issue, until we were devastated by the water damage in San Jose,” says the city’s current vice-mayor, Pam Foley.The city of San Jose and conservation organizations have invested more than $120 million in purchasing 600 hectares of land in north and mid Coyote Valley. In 2021, the city council voted unanimously to change the land-use designations for the area, effectively barring any new development.San Jose’s conservation of Coyote Valley reflects how land care is increasingly seen as crucial to managing flood risk. This marks a radical departure from the twentieth-century approach of trying to engineer water into submission. Conventional flood defences might also be needed, but San Jose is not alone in adopting more-natural methods of water management.More than climateAs the climate warms, the atmosphere can hold more water. In San Jose, this is causing both more-intense droughts and larger storms bringing heavy rain — a recipe for damaging floods. But climate change is not the only culprit behind the increased flooding and losses, says Dominik Paprotny, a geographer at the University of Szczecin in Poland. “It’s always assumed that the flood losses are going up and it’s due to climate change,” he says, but in reality “it’s a complex, human-related process”.In August, Paprotny and colleagues published a study2 breaking down the various drivers of flood risk across Europe. They found that development that degrades the environment is “more relevant than climate change” in causing flood losses, he says.Since 1992, cities around the world have built homes and businesses on floodplains spanning an area the size of Ukraine. This development encourages people to move into harm’s way. Floodplains exist to absorb floods, so people living on them are highly likely to encounter water damage sooner or later.The fact that people continue to build in places that are likely to flood is partly the result of misaligned incentives, says Paprotny. Insurance companies and national governments typically compensate flood losses, or individual property owners take the hit. Developers and local authorities are rarely on the hook. “For them, there is only the benefit,” he says.In the city of Szczecin, Paprotny has seen numerous tower blocks built in a formerly marshy area of the Odra River. He expects them to experience flooding as the rise in sea level increases. About 100 kilometres downriver, the coastal town of Międzyzdroje has approved a residential building right on the beach. “Local authorities have fantasies of building everything, everywhere,” Paprotny says.Short-sighted planning and the expansion of paved areas are not the only factors that increase a city’s flood risk. Healthy soil contains more than half of Earth’s species, including microorganisms, springtails, arachnids, worms and fungi, which turn their mineral home into a matrix that absorbs vast amounts of water. But both inside the city limits and in the wider watershed, pesticides used in lawns and industrial agriculture are killing these creatures, decreasing the soil’s ability to hold water.The good news is that flood managers have much more agency over the local environment than they do over global climate change.Restoring riversThroughout history, cities have sprouted along waterways. As residents and industry polluted the rivers and the demand for centralized real estate grew, cities across the world have followed a standard development plan: fill in wetlands and creeks, or bury them in underground pipes, and then build on top. The small percentage of urban waterways that remain at the surface have typically been straightened, in a misguided attempt to avoid flooding by speeding the water away. The subsequent scouring and erosion prompted people to protect the banks with sandbags or concrete.Milwaukee’s Kinnickinnic River went through this transformation in the 1960s. Its watershed, one of six that run through the Milwaukee metropolitan region into Lake Michigan, is the most heavily urbanized in Wisconsin. In the past few decades, straightening and paving rivers has fallen out of favour globally, says Bill Graffin, public-information manager for the Milwaukee Metropolitan Sewerage District (MMSD), which serves more than 1 million people.Curves, rocks and vegetation are being reintroduced to Milwaukee’s Kinnickinnic River.Credit: Bill GraffinConventional concrete flood control narrows the space for water and accelerates the flow through rivers. This makes them dangerous in storms. “When they’re full, they’re extremely powerful and fast-moving: 20 feet per second, with a pressure of 400 to 500 pounds,” Graffin says. Milwaukee has seen numerous drowning deaths. “If you fall in, there’s a good chance you’re not getting out.”As climate change brings more-severe deluges, concrete channels cannot cope. “It’s the intensity that really throws a wrench into urban planning,” says Graffin. A storm in mid-August this year brought 37 centimetres of rainfall in one area of the county, with an average of 25 centimetres over a 24-hour period. “We got nailed,” he says. The subsequent flooding damaged public infrastructure and more than 4,500 homes and commercial buildings, causing $76 million in damages. Building concrete solutions that can deal with the increased volume from the biggest rainfall events “can get very, very, very costly”, Graffin says. “Green infrastructure can be done cost-effectively.”With that in mind, Milwaukee is expanding the volume for flood waters by restoring the Kinnickinnic River to a more natural state. The MMSD is beginning to remove the industrial flood channel and revive more natural aspects to the river, such as reintroducing curves, rocks and streamside vegetation, in eight projects along the waterway and 12 more in its wider watershed.The plan, which is projected to cost $496 million, is to expand the lateral space for water, allowing it to slow down. To make space, the MMSD has acquired 83 nearby properties. According to Graffin, there was little opposition from owners because their homes had repeatedly flooded, year after year. “They were thankful to see us knock on the door,” he says.Like San Jose, Milwaukee is also looking beyond the city’s footprint to make space for water. Its Green Seams programme is buying undeveloped land, often in natural wetlands. “As soon as we make that purchase, we put that conservation easement onto the land so that it can’t be developed,” says Graffin. The roughly 2,300 hectares of land protected so far, at a cost of $30 million, can store more than 11 billion litres of water. By contrast, a 26-hectare flood storage basin in the city cost $100 million and can hold only 1.2 billion litres.Reverse engineeringAn economic study3 by the non-profit organization Resources for the Future, based in Washington DC, bears out the efficacy of using natural wetlands for flood protection. The researchers found that when US communities invest in protecting a wetland, about half of them save that much money in five years by avoiding flood damages. And flood mitigation can benefit properties 50 kilometres away.In San Jose, residents might not see such a rapid benefit. The land is still open, but previous changes made for the purposes of agriculture resulted in more water flowing into Coyote Creek. “Historically, a lot of the Coyote Valley area was actually this series of discontinuous streams and wetlands, and it didn’t have any kind of surface connection to Coyote Creek,” says Rachel Clemons, a watershed-restoration specialist at the Santa Clara Valley Open Space Authority. Instead of resting on the valley floor, creating a habitat for wildlife or soaking down into the aquifer where it could find its way to Coyote Creek over a longer period of time, the water is funnelled straight into the creek and then through the centre of San Jose (see ‘Upstream opportunity’).Engineering of the land began a century ago, by farmers who saw these seasonal wetlands as an obstacle to them growing good crops. In 1916, the farmers started digging drainage ditches leading to the creek, and laying underground tiles and gravel to route water towards the new channels. Their plan worked: water now drains quickly into the human-made Fisher Creek, which carries it swiftly on to Coyote Creek and through the city, Clemons says.During the Silicon Valley boom in the 1990s, San Jose-based Cisco Systems was considering Coyote Valley for its headquarters. It built a weir, which blocks most of the water flowing from the southern Coyote Valley into Laguna Seca, a seasonal wetland to the north. When the dot-com bubble burst in the early 2000s, Cisco Systems abandoned the project — but the small dam remains. Water that would otherwise flow into Laguna Seca is instead “shuttling to Fisher Creek directly”, Clemons says.The Santa Clara Valley Open Space Authority now manages the 600 hectares of conserved land in Coyote Valley and is working on a plan to partly restore the area’s historical ecology and hydrology. The authority’s natural resources manager, Aaron Hébert, says the agency is considering several restoration options. “All of them generally attempt to route some storm-flows from Fisher Creek into an expanded wetland complex in Laguna Seca,” he says, so not all of it pours into the city during peak flows. This would involve removing or notching the Cisco Systems dam to allow water to flow to Laguna Seca once again, says Clemons.San Jose purchased the land in Coyote Valley “for the purpose of mitigating downstream flooding”, says Hébert. “Not developing the land is obviously a huge benefit for avoiding run-off and storm-water issues.” But breaching the dam so more water can reach the Laguna Seca floodplain “will also help downstream issues, fulfilling the city’s intent”, he adds.Structures resembling the dams of beavers are built to slow the water in Coyote Valley.Credit: Open Space AuthorityThe Open Space Authority has started other work, such as altering an agricultural drainage ditch by installing two structures that resemble the dams built by beavers. Made of sticks and mud, the structures are inherently temporary, but they can nevertheless jump-start more-complex hydrology by slowing the flow of water, collecting sediment and allowing vegetation to sprout, which slows the flow further. It is “low-cost and simple”, Clemons says. The aim is to back-up surface water after storms and keep it from flowing quickly into the river. Individually, such projects create more wetland habitat for wildlife. If replicated throughout the region, they “could potentially have a significant [beneficial] effect on flooding, too”, Clemons adds.A place for concreteEven so, San Jose needs more than the restored natural hydrology of Coyote Valley to protect it from flooding, say representatives of the local utility company, Valley Water. Jack Xu, a senior engineer at the company, which manages surface water in the area, says that although projects in Coyote Valley offer “a little bit of benefit”, the valley makes up only a small percentage of Coyote Creek’s watershed. Most of it flows from the valley’s eastern foothills and is usually captured by a reservoir operated by Valley Water.Hébert acknowledges the relative sizes of the watershed areas, but maintains that the Open Space Authority’s projects are still important. “Flooding is a cumulative impact, and every little bit you can do helps,” he says. “Sometimes just 1 foot less of water is the difference between damage and not.”Valley Water has responded to the 2017 disaster by instituting some conventional forms of flood control. It is working on more than 7,600 metres of flood walls and other barriers along 14 kilometres of the creek; these are expected to cost $359 million, plus the cost of maintenance. From a financial perspective, Xu says, it would be better to give the creek more room. “We don’t have to maintain anything,” he says. “It would save everyone a lot of headache.” But Valley Water does not think this is an option in San Jose, because “there’s absolutely no room to widen the creek”, Xu says. However, the company did acquire 13 homes for the project.If space can be made for water in and around cities, the benefits can be greater than simply reducing urban flooding. Such projects also protect against drought by moving water underground to feed local creeks, wetlands and rivers in the dry season. Recharging water underground also counteracts subsidence, the sinking that cities experience when they pump out too much groundwater. And de-paving helps protect against fires, because well-hydrated plants are less likely to burn.Persuading city planners and utility companies to consider such benefits when making decisions can be difficult, because most cost–benefit analyses tend to focus on only one thing, such as how much a levee will reduce the flood risk for the neighbourhood directly behind it. Even when land-use planners understand the benefits, they can run into roadblocks if other government bodies in their watershed are reluctant to collaborate.But nature’s jurisdictions are inviolable. “Water doesn’t obey city-limit signs,” said Graffin. “It obeys watersheds.” More

  • in

    Spatiotemporal variability of surface water quality in tropical agriculture-dominated catchments: insights from water quality indices

    AbstractSurface water quality in tropical, agriculture-dominated catchments faces intense pressure from human activities, yet comprehensive, index-based assessments for these regions remain limited. This study aimed to use an index-based assessment to examine the spatial and temporal changes in water quality within the Maziba catchment in southwestern Uganda, characterised by increasing land-use pressures. Monthly surface water samples were collected from 16 stations between July 2023 and June 2024 to analyse physicochemical parameters. The study employed the Weighted Arithmetic Water Quality Index (WAWQI) for assessing drinking water suitability, the Comprehensive Pollution Index (CPI) for evaluating aquatic ecosystem health, and a new combined risk framework to deliver an integrated, stakeholder-oriented assessment. WAWQI results ranged from “good” to “unfit for consumption”, with 69% of stations classified as “poor” to “unfit”. CPI indicated “slight pollution” on average. Notably, the integrated risk assessment did not classify any stations as “Low Risk”, while most were classified as “High Risk” (50.0%) or “Severe Risk” (18.8%). Human activities and seasonal changes have a significant impact on water quality deterioration in the Maziba catchment. The simultaneous decline in water suitability for drinking and ecosystem health underscores the need for integrated management strategies that target both diffuse and point-source pollution to protect public health and aquatic ecosystems.

    Similar content being viewed by others

    Drinking water resources suitability assessment in Brahmani river Odisha based on pollution index of surface water utilizing advanced water quality methods

    Article
    Open access
    30 September 2025

    Comprehensive monitoring of the spatiotemporal variation of water quality and its associated human health risks in Luvuvhu river catchment, Vhembe biosphere reserve, South Africa

    Article
    Open access
    17 October 2025

    Evaluation of the surface water quality using global water quality index (WQI) models: perspective of river water pollution

    Article
    Open access
    22 November 2023

    Introduction Surface water quality in tropical agricultural zones is deteriorating under intense anthropogenic pressure, a trend that jeopardises public health, ecological stability, and sustainable development. The degradation of these water resources caused by nutrient loading, agricultural chemicals runoff, and soil erosion is concerning because these regions are crucial for global food security1,2. These impacts are exacerbated by tropical climate patterns, where heavy rainfall increases pollutant transport, and dry seasons concentrate contaminants in reduced water volumes3,4,5. The public health consequences of surface water degradation are extensive on a global scale. While sources like rivers and dams supply nearly one-third of the world’s drinking water, a minimum of two billion people consume water contaminated with faeces6,7.Water quality in agricultural catchments is influenced by a complex interplay of natural processes and human activities3,6,8. In tropical regions, the combined effects of agriculture, urbanisation, population growth, and industrialisation are primary drivers of water contamination1,9,10. For instance, inadequate wastewater treatment and extensive fertiliser application lead to nutrient enrichment, fostering excessive algal growth, dissolved oxygen (DO) depletion, and foul odours from anaerobic microbial breakdown6. These factors harm human health and ecosystems by introducing pollution that surpasses the natural capacity for purification11. While natural self-purification processes driven by bacterial metabolic responses, can break down organic materials when sufficient DO is present12, intensive agricultural practices and urbanisation frequently overwhelm these mechanisms. For instance, agricultural intensification and urbanisation have contributed to a decline in water quality in the Niger River in Bamako, Mali, representing a widespread environmental issue in tropical regions13,14.Given the increasing pressures on freshwater resources, water quality evaluation has gained importance. Water Quality Indices (WQIs) offer a practical method for this evaluation by transforming multiple water quality parameters into a single numerical value, simplifying interpretation and informing decision-making15. Since their initial proposal by Horton (1965)16 and Brown et al. (1970)17, a variety of WQIs have been developed for different applications and regions. These include the US National Sanitation Foundation Water Quality Index (NSFWQI), Canadian Council of Ministers of the Environment Water Quality Index (CCMEWQI), British Columbia Water Quality Index (BCWQI), Weighted Arithmetic Water Quality Index (WAWQI), Oregon Water Quality Index (OWQI), or Comprehensive Pollution Index (CPI)18,19. Each index employs specific parameters, weighting schemes, and aggregation methods tailored to its respective assessment objectives. Beyond assessing current water quality, these indices are valuable tools for trend analysis and supporting environmental management decisions14.The WAWQI and CPI are helpful as they incorporate parameters relevant to both human consumption and ecological health20. The WAWQI assesses the suitability of drinking water, incorporating parameters such as nutrients, dissolved oxygen, and bacterial indicators that reflect human health risks. CPI can evaluate ecological conditions, as it effectively identifies waters with multiple stressors and can distinguish between slight and severe pollution levels21,22.The Maziba catchment, a transboundary system between Uganda and Rwanda, serves as a prototypical example for the mountainous region bordering the Democratic Republic of Congo (DRC). It provides surface water for domestic, agricultural, and hydropower needs. At the same time, its landscape is characterised by the region’s common challenges, including steep topography, high population density, and intensifying land-use pressures. Human activities, including agriculture on steep slopes, deforestation, improper waste disposal, and urbanisation, increasingly threaten water quality through accelerated erosion and pollutant transport. Untreated wastewater from Kabale town is discharged chiefly directly into the surface water system. These practices introduce diffuse pollutants, such as sediments from erosion, excess nutrients, and chemical contaminants, which degrade the quality of surface water. Despite the reliance of local communities on surface water, comprehensive data on its current water quality status is lacking. Conventional water quality assessments in the Maziba and other similar tropical catchments are often fragmented23, focusing on individual parameters that fail to provide a holistic picture. Data from official authorities are often difficult to access or unavailable, and few published papers detail the distribution of water quality or provide comprehensive datasets24,25,26,27.Given the multiple pollution sources in the Maziba catchment, an integrated assessment approach is essential. The Water Quality Index (WQI) offers this holistic perspective by aggregating multiple parameters into a single, interpretable value that reflects overall water quality28. While studies have demonstrated the effectiveness of WQIs in assessing water quality in agricultural and mixed-use catchments globally29,30, the application to the African tropics and catchments like the Maziba remains scarce. This study evaluates how multiple anthropogenic pressures and seasonal dynamics influence water quality in the Maziba catchment through an integrated assessment. It characterises the spatiotemporal variability of its physicochemical parameters, uses multivariate statistical analysis to identify the principal drivers of water quality changes, and applies the WAWQI and CPI indices to produce a holistic evaluation of its suitability for human and ecological use. These indices are then integrated into a combined risk framework to create a single, stakeholder-oriented classification for management purposes. The complete dataset from this research has also been made publicly available, providing a comprehensive water quality dataset from the African tropics that is rarely published31. This research supports Sustainable Development Goal (SDG) 6 (Clean Water and Sanitation) by evaluating water quality threats and pinpointing pollution sources that endanger safe water access in tropical agricultural areas. The study also advances SDG 15 (Life on Land) by examining ecosystem health impacts and safeguarding aquatic biodiversity.Materials and methodsStudy areaThe Maziba catchment, located in southwestern Uganda (29.9°–30.1°E, 1.1°–1.6°S), extends into Rwanda. The catchment covers an area of 722 km² with elevations ranging from 1,757.7 m at Maziba Dam (1.31°S, 30.09°E), which serves as the outlet, to 2,488 m in the north-eastern highlands (Fig. 1). The catchment can be divided into three sections: the Upper Maziba (covering parts of Rubanda and Kabale Districts), the Middle Maziba (covering parts of Rukiga and Ntungamo Districts), and the Lower Maziba (situated within Ntungamo District). The catchment is dominated by subsistence agriculture (~ 85% of livelihoods), with major crops including maize, Irish and sweet potatoes, bananas, beans, and vegetables for subsistence, while tobacco, coffee, fruits, pyrethrum, sorghum, wheat, and millet serve as the primary cash crops. The region’s steep topography, combined with intensive agricultural land use on hillslopes, results in a high erosion potential and diverse sources of pollution. Despite these factors, the overall water quality status throughout the catchment remains poorly understood.Fig. 1Map of the Maziba catchment showing the 16 water quality sampling stations and land cover distribution. The catchment extends from southwestern Uganda into Rwanda, with elevation ranging from around 1,760 m at Maziba Dam to 2,488 m in the northeast. Land cover is dominated by cropland (yellow) with limited tree cover (dark green). Numbers 1–16 with a dark circle/background indicate sampling stations, while numbers with a white background represent sub-catchments listed in Table 1. The Maziba drains towards the Kagera river.Full size imageThe climate of the Maziba catchment, based on long-term (1981–2023) meteorological records from Kabale station and discharge data downstream from Maziba Dam, displays distinct seasonal patterns (Fig. 2). Precipitation follows a distinct bimodal pattern, with the main wet seasons typically occurring from March to May and again from September to November or December. Drier conditions typically occur from June to August and in January to February. On average, annual precipitation amounts to about 1033 mm, with totals ranging from 759 to 1225 mm. The mean annual temperature remains relatively constant, averaging 18.2 °C, with annual means between 17.3 and 18.8 °C. Potential evapotranspiration (PET), calculated using the Hargreaves method, averages around 1,441 mm, with yearly totals from 1,246 to 1,624 mm. The annual discharge generally reflects the rainfall patterns, peaking after the wet seasons and decreasing during dry periods, with an average of 4.88 m³/s and yearly means from 1.29 to 10.2 m³/s. It should be noted that the data for these annual figures vary due to differences in data completeness across parameters.Fig. 2Monthly seasonality of climate and hydrological parameters for the Maziba catchment, Uganda. Daily meteorological data (precipitation and temperature) were obtained from Kabale Meteorological Station, while daily discharge measurements were taken downstream of Maziba Dam. Potential evapotranspiration (PET) was calculated using the Hargreaves method. Boxplots display the median (horizontal line), interquartile range (box), and whiskers extending to 1.5 times the interquartile range. White diamonds indicate the mean. Monthly precipitation and PET are shown as sums, while mean monthly temperature and discharge represent monthly averages of daily values. Data spans 1981–2023, with various gaps present.Full size imageOn-site measurement, sample collection and storageWater samples were collected monthly from 16 strategically selected sites across various watersheds of the upper Maziba sub-catchment over a 12-month period (July 2023–June 2024) (Fig. 1; Table 1). Selection criteria were based on land use intensity, population density, and accessibility. On-site measurements of physicochemical parameters, including water temperature (WT), electrical conductivity (EC), dissolved oxygen (DO), and pH, total dissolved solids (TDS), were conducted in accordance with the American Public Health Association (APHA) guidelines (2023)32. A water-resistant handheld pH and EC meter (HI98130) measured temperature, pH, EC and TDS, while the DO was assessed using a DO meter (PDO-519 model). An Aquafluor™ handheld fluorometer was used to measure the turbidity and Chl-a from a well-agitated sample in a cuvette, and readings were recorded after stabilisation. Water samples for laboratory analysis were collected in 1-litre plastic bottles and transported in a cool box to the Ministry of Water and Environment’s National Reference Laboratory (Entebbe) for physicochemical analysis. While in the laboratory, samples were stored at 4 °C, awaiting analysis.Table 1 Characteristics of the 16 water quality sampling stations in the Maziba sub-catchment, showing station codes, names, contributing sub-catchment areas, total upstream drainage areas, elevation, and geographic coordinates.Full size tableLaboratory analysisTotal nitrogen (TN), total phosphorus (TP), nitrate nitrogen (NO₃⁻-N), ammonium nitrogen (NH₄⁺-N), Nitrite nitrogen (NO₂⁻-N), and soluble reactive phosphorus (SRP), Chloride (Cl−), Sulphates (SO₄²⁻), sodium (Na⁺), and Potassium (K⁺) were analysed in the laboratory using a discreet photometric analyser (Thermoscientific gallery plus model), following the standards set by APHA (2023)32. The fully automated discreet analyser provided quality repeatable water analysis results with minimal errors, thus ensuring confidence in the quality of analytical results. Total suspended solids (TSS) was determined by the standard gravimetric method in accordance with APHA (2023) standard guidelines.Statistical analysisData analyses were conducted using Statistical Package for the Social Sciences (SPSS) 27.0 and R. Before statistical analysis, the Kolmogorov–Smirnov test was employed to determine whether the data followed a normal distribution. The Kruskal–Wallis test was used for non-normally distributed data, and one-way analysis of variance (ANOVA) for normally distributed data to assess differences in the measured parameters across stations and months. The Mann-Whitney U test was used to determine whether significant differences existed in measured parameters between seasons, except for non-normal data, where an independent-samples t-test was employed. A principal component analysis (PCA) was conducted to identify the key physicochemical parameters that contribute most to variability. The Kaiser–Meyer–Olkin (KMO) and Bartlett’s tests confirmed the adequacy of the data for PCA. The KMO value of 0.760 exceeded the recommended threshold of 0.6, affirming the suitability of the data for PCA. Bartlett’s Test (χ² = 2078.072, df = 171, p < 0.001) yielded significant results, further validating the adequacy of the data for PCA. Spearman’s correlation coefficients were calculated to evaluate the linear relationships among water parameters. A significance level of 0.05 was applied to all tests, with a highly significant threshold set at 0.01.Water quality indices (WQI)Index-based methods simplify water quality assessment by combining multiple parameters into scores that are easy to interpret, compare across sites and seasons, and support catchment management decisions.Weighted arithmetic water quality index (WAWQI) – drinking water suitabilityThe Weighted Arithmetic Water Quality Index (WAWQI) by Brown et al. (1972)33, as shown in Eqs. 1–4, assessed drinking water quality against World Health Organisation standards (WHO)34. To provide a more localised context, we used the standards set by the Uganda National Bureau of Standards (UNBS)35 for natural potable water as a reference. In total, thirteen physicochemical parameters (i.e., DO, pH, turbidity, EC, NH₄⁺-N, NO₃⁻-N, SRP, Cl⁻, NO₂⁻-N, Na⁺, SO₄²⁻, TH, and temperature) were employed to calculate the WAWQI. The standard values used to compute the index are presented in Table 2. The WAWQI is categorised into five classes: excellent, good, poor, extremely poor, and unfit for consumption, based on index values of 0–25, 26–50, 51–75, 76–100, and over 100, respectively (Table 3).The WAWQI is calculated as follows:$${rm WAWQI} = frac{Sigma Q_n W_n}{Sigma W_n}$$
    (1)
    withQn = quality rating of the nth water quality parameter.Wn = the unit weight of the nth water quality parameter.Qn is computed using Eq. (2).$$Q_n = 100 lfloor (frac{V_n – V_i }{S_n – IV}rfloor$$
    (2)
    withVn = the concentration value of nth variable;IV = the ideal value (IV = 0, except for DO (IV = 14.6 mg/L) and pH (IV = 7).Sn = the standard permissible value for the nth variable.The calculation of unit weight (Wn) for the selected physicochemical variables is inversely proportional to the recommended standard/threshold values for the corresponding variables.$$W_n = frac{K}{S_n}$$
    (3)
    withK = the constant of proportionality computed using Eq. 4.$$K =frac{1}{Sigma frac{1}{S_n}}$$
    (4)
    Table 2 Threshold values for WAWQI calculation.Full size tableTable 3 Water quality index (WAWQI) classification scheme for drinking water quality assessment.Full size tableComprehensive pollution index (CPI) – ecosystem healthThe assessment of river water quality for aquatic ecosystems and ecosystem health was conducted using the Comprehensive Pollution Index (CPI), as defined in Eqs. 5 and 6, applied by8,36. To account for local conditions, we used the Uganda National Bureau of Standards35 as a reference for natural potable water standards. Nine parameters (i.e., DO, pH, turbidity, EC, NH₄⁺-N, NO₃⁻-N, SRP, Cl⁻, and temperature) were used. All the standard values for computing the CPI are similar to those used in WAWQI Table 2, except for DO (8 mg/L), Cl⁻ (120 mg/L), and temperature (27 °C).The CPI is calculated as follows:$$rm CPI =frac{1}{n} ast Sigma: PI$$
    (5)
    withn = number of monitoring parameters or selected pollutants;PI = the single-factor pollution index from each measured parameter (i),i = starting number of monitoring parameters.The single-factor pollution index (Pi) is calculated according to the following equation:$${rm PI} (frac{V_n}{S_n})$$
    (6)
    withCi = measured concentration of parameter i in water.Si = standard value of the ith parameter based on international standard guidelines for drinking purposes and aquatic life (CCME). Similar to the WAWQI, the CPI categorises pollution levels into five categories (Table 4).Table 4 CPI-based water quality classification scheme (Ecosystem Health).Full size tableCombined risk assessment for communication and water resource managementTo provide a holistic assessment of water resource health, a combined risk framework was developed. This approach integrates the findings from the WAWQI (drinking water suitability) and the CPI (ecosystem health) into a single, stakeholder-oriented classification that facilitates communication and management. By considering both human-use and ecological endpoints, this framework allows for a more nuanced understanding of the pressures on the water system and helps prioritise management actions.Five risk levels were defined, classifying each station based on the combined status of its drinking water and ecosystem health indicators (Table 5). A Low Risk classification was assigned only to stations where both drinking water quality was high (WAWQI ≤ 50) and ecosystem impact was minimal (CPI ≤ 0.4). Conversely, a station was immediately classified as Severe Risk if either its drinking water was deemed unfit for consumption (WAWQI > 100) or its ecosystem was severely impacted (CPI > 2.0). Intermediate risk levels were defined to distinguish between primary threats. A Moderate Risk (Ecosystem) classification indicates stations with good drinking water quality (WAWQI ≤ 50) but where ecological health is compromised (0.4 < CPI ≤ 2.0). In contrast, a Moderate Risk (Drinking) classification highlights stations with poor drinking water quality (WAWQI > 50) but where ecosystem health remains good (CPI ≤ 0.4). Finally, a High-Risk classification was assigned to stations where both drinking water quality and ecosystem health were simultaneously degraded, representing a multi-faceted management challenge. This integrated assessment provides a basis for evaluating and communicating the complex trade-offs and combined impacts affecting the water body.Table 5 Risk assessment matrix combining the weighted arithmetic water quality index (WAWQI) and the comprehensive pollution index (CPI) to classify overall water resource health.Full size tableResultsSpatial variability of physicochemical parametersFigures 3 and Appendices A1 – A3 provide a detailed summary of the variations in the measured physicochemical parameters across the studied locations. The figure shows the overall distribution of the data, including any outliers identified at particular stations. The appendices offer quantitative evaluations, including the mean values and standard deviations of the observed water quality parameters.The highest mean temperature of 22.15 ± 2.65 °C was recorded at Katuna Station (M12), while Butobere Station (M13) showed the lowest mean temperature at 18.27 ± 1.43°c. Turbidity levels ranged from 17.44 ± 7.2 NTU at Ihanga West Station (M6) to 82.38 ± 66.69 NTU at Maziba Dam (M16). The maximum DO content was observed at Ihanga West (M6) at 6.91 ± 1.15 mg/L, whereas Lower Bugongi (M8) had the lowest at 4.31 ± 1.44 mg/L. Butobere (M13) recorded the highest mean EC value of 262.25 ± 74.99 µS/cm, while Kabanyonyi (M15) showed the lowest at 73.50 ± 23.72 µS/cm. The highest total dissolved solids (TDS) level was measured at Butobere (M13) with 172.28 ± 61.65 mg/L, and the lowest at Kabanyonyi with 51.44 ± 16.6 mg/L. The Kruskal-Wallis Test indicated statistically significant differences in temperature, DO, turbidity, EC, and TDS across the study stations (p < 0.05).The Maziba Dam station (M16) recorded the highest TSS value (298.75 ± 343.81 mg/L) compared to Katuna (M12; 287.75 ± 288.54 mg/L), while Ihanga West (M6) observed the lowest mean value of 52.42 ± 30.99 mg/L. Mean chlorophyll a (Chl-a) values ranged from 1.95 ± 0.63 µg/L at Ihanga West (M6) to 6.55 ± 2.85 µg/L at the Ihanga Full Gospel Church (FGC) station (M5). The pH values varied from 6.80 ± 0.24 at Lower Bugongi (M8) to 7.30 ± 0.30 at Ihanga West (M6). Kabanyonyi (M15) also showed the lowest sodium and total hardness values, while Butobere (M13) and Hakakondogoro (M1) had the highest. Additionally, Kabanyonyi (M15) exhibited the lowest values for Na⁺, K⁺, Cl⁻, and SO₄²⁻, with Lower Bugongi (M8) registering the highest values, except for sulphates. The Kruskal-Wallis test indicated significant differences in Na⁺, K⁺, pH, Cl⁻, SO₄²⁻, total suspended solids (TSS), and total hardness (TH) values (p < 0.05) across the study stations.Figure 3 and Appendix A3 also show variations in nutrient parameters across the study stations. TN values ranged from 2.95 ± 0.88 mg/L at Kabanyonyi (M15) to 4.91 ± 0.87 mg/L at Brazin Forest station (M9). The highest NO₃⁻-N concentration was at Brazin Forest (M9; 3.46 ± 0.78 mg/L), followed by Butobere (M13; 3.01 ± 1.31 mg/L), with Hakakondogoro (M1) recording the lowest value (1.91 ± 0.74 mg/L). The mean NO₂⁻-N levels were lowest at Mukirwa (M3; 0.01 ± 0.01 mg/L) and Ihanga West (M6; 0.11 ± 0.12 mg/L). At Ihanga West and Hakakondogoro, mean NH₄⁺-N levels ranged from 0.07 ± 0.04 to 0.41 ± 0.32 mg/L. For TP, values ranged from 0.16 ± 0.19 mg/L at Ihanga West (M6) to 0.40 ± 0.46 mg/L at Kyanamira (M14) and 0.40 ± 0.57 mg/L at Maziba Dam (M16). Butobere (M13) recorded the lowest SRP value of 0.03 ± 0.02 mg/L, while the highest values were observed at Kakore East (M2; 0.11 ± 0.10 mg/L), Lower Bugongi (M8; 0.11 ± 0.08 mg/L), and Brazin Forest (M9; 0.11 ± 0.19 mg/L) (Fig. 3). Statistically significant differences were observed in TN, NO₃⁻-N, NO₂⁻-N, and NH₄⁺-N (p < 0.05), with the exception of SRP and TP values (p > 0.05) across the study stations.Fig. 3Spatial distribution of water quality parameters across 16 monitoring stations (M1-M16) in the Maziba catchment. Box plots show median (horizontal line), interquartile range (box), whiskers (1.5 × IQR), outliers (black dots), and mean values (white diamonds). Parameters are organised by category: physical parameters (temperature, dissolved oxygen, turbidity, TSS, electrical conductivity, chlorophyll-a), general chemistry (pH, TDS, total hardness), major ions (sodium, potassium, chloride, sulphates), and nutrients (phosphorus and nitrogen forms). Twelve samples were collected per station between July 2023 and June 2024, totalling n = 192.Full size imageTemporal variability in water quality parametersFigure 4 and Appendices A4-A6 show the temporal changes in physicochemical parameters recorded during the study period, from July 2023 to June 2024. Based on these patterns, water quality parameters in the Maziba sub-catchment display complex seasonal fluctuations driven by both climate and human activities. Temperature reaches its lowest levels in August-September (~ 17–19 °C) and peaks in March-May (~ 22–23 °C). Meanwhile, dissolved oxygen exhibits an opposite pattern, with the highest levels in August and September, and the lowest from January to July, reflecting temperature-dependent oxygen solubility. Turbidity and TSS reflect the influence of the wet season; turbidity remains low from January to September (except for a small peak in February), before rising sharply from October to December. Meanwhile, TSS exhibits prominent peaks in February and November, with low levels in May. Electrical conductivity peaks in February and reaches its minimum in October and December, indicating dilution during peak rainfall periods. Chlorophyll-a concentrations stay high from January to May and September to December, with the lowest values during June to August, suggesting reduced algal productivity during the cooler dry season (Fig. 4 and Appendix A4).Chemical parameters show varied responses: pH values are higher in January, March, and August, with lows in November and December; TDS follows EC patterns, being lower in October and December, while total hardness varies greatly, peaking in October. Major ions exhibit seasonal concentration and dilution cycles: sodium is lowest in October and December, potassium peaks in February and November but is low in March and December. Chloride remains relatively stable despite high variability and low values in October to December, and sulphates have higher levels in June, September, November, and December (Fig. 4 and Appendix A5).Nutrient dynamics reveal particularly complex patterns: both total phosphorus and SRP peak in February, May, July, and November, with high variability during these months; total nitrogen is elevated in February, May, and June but lower in March, April, July, and October; nitrate-nitrogen peaks in May, June, and November, showing high variability in August; nitrite-nitrogen peaks in August and November; while ammonium-nitrogen is highest in May and September, with increased variability during these periods. These patterns suggest that water quality is mainly influenced by the interaction between rainfall-driven dilution and runoff processes, with the peaks observed in February and November across multiple parameters indicating periods of pollutant mobilisation during early wet season events. Additionally, the high nutrient variability during specific months indicates episodic pollution inputs, likely from agricultural activities coinciding with planting and fertilisation cycles. Statistically significant differences were observed in WT, DO, turbidity, Chl-a, pH, EC, and TDS values (p < 0.05) across the study months.Fig. 4Temporal variation of water quality parameters in the Maziba sub-catchment from July 2023 to June 2024. Box plots show median (horizontal line), interquartile range (box), whiskers (1.5 × IQR), outliers (black dots), and mean values (white diamonds) for monthly samples collected from all 16 monitoring stations (n = 192).Full size imageAll measured parameters showed significantly higher values during the dry season (i.e., January-February and June-August) than in the wet season (September-December and March-May), with exceptions noted for Chl-a, SRP, TN, turbidity, water temperature, and NH₄⁺-N (Fig. 5). The Mann-Whitney U test indicated statistically significant differences in mean values between the dry and wet seasons, except for NO₂⁻-N, NH₄⁺-N, TH, SO₄²⁻, TSS, DO, and TN (p > 0.05).Fig. 5Seasonal variability of physicochemical parameters in the Maziba sub-catchment, western Uganda.Full size imageMultivariate analysis of water quality parametersPrincipal Component Analysis (PCA) identified six components with eigenvalues exceeding 1, collectively explaining 72.1% of the total variance (Table 6). Table 7 summarises the primary loadings for each component. PC1, which accounted for 27.10% of the variance, was dominated by high positive loadings of EC (0.916), TDS (0.922), Na⁺ (0.886), and K⁺ (0.727), and was therefore interpreted as representing salinity and ionic strength. PC2 explained 14.80% of the variance and was strongly associated with turbidity (r = 0.876) and Chl-a (0.794), indicating turbidity and biological productivity. PC3 accounted for 10.79% of the variance and was linked to high loadings for SRP (0.877) and TP (0.769), indicating phosphorus enrichment. PC4 contributed 7.52% of the variance, showing positive loadings for DO (0.742) and pH (0.708), along with a negative loading for water temperature (−0.726). This was interpreted as indicating a thermal-chemical balance in the river ecosystem. PC5, which explained 6.45% of the variance, was dominated by TH, indicating hardness and ionic minerals. PC6, representing 5.47% of the variance, was mainly associated with NH₄⁺-N, reflecting ammonium/nitrogen inputs.Table 6 Eigenvalues of the correlation matrix.Full size tableTable 7 Summary of the main loadings for each component.Full size tableA Spearman correlation analysis was performed to examine the relationships among the measured water quality parameters (Table 8). The results showed that SRP and TP had statistically significant positive correlations with pH, EC, TDS, Na⁺, K⁺, NH₄⁺-N, and Cl⁻ (p < 0.05). Additionally, Cl⁻ and SO₄²⁻ displayed strong positive correlations with EC, TDS, TH, Na⁺ and K⁺ (p < 0.05). Furthermore, NO₂⁻-N and NH₄⁺-N were strongly positively correlated with turbidity, Chl-a, EC, TDS, and K. Total nitrogen (TN) and NO₃⁻-N demonstrated strong positive correlations with all measured parameters except turbidity, Chl-a, and pH. Turbidity had a strong positive correlation with NO₂⁻-N, NH₄⁺-N, and TSS, while showing negative correlations with Chl-a, pH, EC, TDS, and TH (p < 0.05). Temperature exhibited a significant negative correlation with dissolved oxygen (DO), pH, TH, and SO₄. In addition, DO showed a negative correlation with Chl-a, pH, EC, TDS, TH, Na⁺, K⁺, NO₂⁻-N, NH₄-N, SRP, TP, and TN. TDS was positively correlated with EC, TH, Na⁺, K⁺, NO₂⁻-N, NH₄⁺-N, Cl⁻, SO₄²⁻, SRP, TP, NO₃⁻-N, TN, and TSS, but negatively correlated with DO (p < 0.05).Table 8 Spearman’s correlation matrix between the physicochemical parameters of water in the Maziba sub-catchment.Full size tableWater quality indicesThe water quality assessment reveals notable spatial and temporal differences across the 16 monitoring stations (Fig. 6). The Weighted Arithmetic Water Quality Index (WAWQI), which indicates the safety of drinking water, showed that station quality ranged from “Excellent” to “Very Poor” based on the average values (Fig. 6A). Although there was a tendency towards higher WAWQI values (poorer quality) during the wet season, this variation was not statistically significant (p > 0.05, Fig. 6B). Conversely, the Comprehensive Pollution Index (CPI), which reflects ecosystem health, exhibited a statistically significant increase during the wet season (p < 0.05), implying that runoff events probably contribute to ecological deterioration (Fig. 6D).The integrated risk assessment, which combines both drinking water and ecosystem health indicators, is shown in Fig. 6e. The distribution of individual measurements and station averages across the risk quadrants confirms a strong link between the two indices; stations with high CPI values also have high WAWQI values, clustering in the ‘High Risk’ and ‘Severe Risk’ zones. As a result, none of the 16 stations could be classified as “Low Risk” (Fig. 6F). The most common category was High Risk, with eight stations (50.0%), characterised by the simultaneous decline in both drinking water quality and ecosystem health. An additional five stations (31.2%) were identified as Moderate Risk (Ecosystem), indicating areas where ecosystem health is affected even if drinking water quality remains acceptable. The remaining three stations (18.8%) fell into the Severe Risk category, representing sites in critical decline that require urgent management action. No stations were classified as “Moderate Risk (Drinking)”, indicating that environmental pressures simultaneously harm both ecosystem health and water quality.Fig. 6Water quality index analysis for the 16 monitoring stations in the Maziba. (A) Mean Weighted Arithmetic Water Quality Index (WAWQI) for drinking water suitability by station. Points represent the mean value, and error bars indicate the standard deviation. (B) Seasonal variation of WAWQI values, showing individual measurements as jittered points overlaid on boxplots. (C) Mean Comprehensive Pollution Index (CPI) for ecosystem health by station, with error bars indicating standard deviation. (D) Seasonal variation of CPI values, with statistical significance from a Wilcoxon test noted (p < 0.05). (E) Correlation plot of WAWQI versus CPI, with individual measurements (small points) and station averages (large, labelled points) coloured by their combined risk level. (F) Bar chart showing the distribution of stations across the five integrated risk categories, with the number and percentage of stations in each class. In panel A-E, dashed lines indicate quality thresholds.Full size imageThe spatial mapping of water quality indices to their respective upstream sub-catchments reveals distinct patterns across the Maziba catchment (Fig. 7). The WAWQI distribution indicates that sub-catchments in the northern headwater regions generally exhibit poorer drinking water quality, with the exception of sub-catchments 3, 5, 6, and 7 in the north-west, which show better conditions (Fig. 7A). The CPI mapping indicates that all sub-catchments experience slight ecological impacts (Fig. 7B). The integrated risk assessment demonstrates that no sub-catchments qualify as low risk, with high- to severe-risk areas dominating the catchment (Fig. 7C). This spatial pattern reflects the cumulative effects of land use practices, with urban discharge from Kabale town and intensive agriculture on steep slopes contributing to downstream water quality deterioration.Fig. 7Spatial distribution of water quality indices and risk assessment in the Maziba catchment. (A) Weighted Arithmetic Water Quality Index (WAWQI) mapped to sub-catchments upstream of monitoring stations, showing drinking water suitability classifications. The sub-catchments are numbered according to the sampling points.Full size imageDiscussionVariability in physicochemical parametersWater temperature showed considerable variation among different sampling stations despite remaining within the ideal range for aquatic organisms, and it does not directly threaten drinking water quality37. However, temperatures exceeding 17 °C may promote the survival of pathogens such as Vibrio cholerae, as observed in communities vulnerable to cholera in Uganda38. The low surface water temperature at Butobere station is caused by the shade provided by nearby riparian vegetation and eucalyptus trees, which reduces sunlight reaching the water. Similarly, Kalny et al. (2017)39 found that shaded river reaches cooled downstream by up to 3.5 °C compared to unshaded areas, highlighting the vital role of riparian vegetation in regulating water temperatures. The fluctuating water temperatures have a significant impact on aquatic life, affecting metabolic rates, oxygen levels, and species distribution40. They further explain that higher temperatures can increase metabolic demands. If these demands are not met due to insufficient oxygen, sensitive species may experience stress or even mortality.Dissolved oxygen (DO) levels showed significant differences across study stations. DO levels below 5 mg/L can stress aquatic organisms, indicating increased organic matter decomposition and oxygen use. High DO levels at Ihanga West station suggest minimal organic loading and effective reaeration. Many physical, biochemical, biological, and ecological processes influence DO levels in rivers and streams. These include aeration and diffusion, oxygen production through photosynthesis, oxygen consumption via respiration, organic matter breakdown, and nitrification41. The balance between oxygen-consuming processes like organic matter decomposition and replenishing mechanisms such as atmospheric oxygen diffusion and photosynthesis controls DO concentrations in river ecosystems. The highest DO in August 2023 coincided with lower water temperatures, which help retain oxygen. The negative correlation between temperature and DO supports basic physicochemical principles outlined by Ibrahim and Abdulkarim (2017)42 in their study on Ajiwa reservoir, confirming that colder waters hold more dissolved oxygen across various stations. Likewise, Saturday et al. (2023)27 found significant seasonal changes in DO levels and water temperature in the Lake Mulehe sub-catchment.The turbidity levels observed ranged from 17.44 ± 7.2 NTU at Ihanga West to 82.38 ± 66.69 NTU at Maziba Dam, both exceeding the WHO-recommended limits for drinking water and aquatic life, which are 5 NTU and 25 NTU, respectively. Maziba Dam Station represents the most downstream study station, where high turbidity levels are mainly linked to surface runoff from agricultural fields and high soil erosion on the steep hillsides upstream. High turbidity can shield harmful microorganisms from disinfection processes and is associated with increased microbial contamination, which poses significant risks to human health. High turbidity levels adversely affect the feeding mechanisms of aquatic life and reduce the photosynthetic efficiency of macrophytes. This relationship is supported by Nimusiima et al. (2023)25, who found a significant correlation between high turbidity levels and increased concentrations of Escherichia coli and heavy metals in the Kagera River, with turbidity measurements exceeding the WHO guidelines. Furthermore, Sahani (2024)43 reported that 93.3% of wetland water sources exhibited turbidity, with only one water source (6.67%) remaining classified as clear (non-turbid) over an 18-month monitoring period in Rukiga District, situated within the Maziba catchment. The pH levels ranged from 6.80 ± 0.24 at Lower Bugongi to 7.30 ± 0.30 at Ihanga West, within the WHO (2018)37 recommended range of 6.5–8.5. Although pH alone does not fully determine water quality, these values suggest the water is safe for drinking and supports aquatic life38.Electrical conductivity (EC) showed significant variations across the study stations, with Hakakondogoro and Butobere stations recording the highest levels. These differences reflect the influence of land use on the surrounding areas. Notably, both EC and TDS values remained within the safe limits established by the WHO, indicating that the salinity and mineral content in the water are low. High conductivity levels can be attributed to human activities, mainly from the discharge of domestic wastewater and agricultural runoff, which increase the concentration of dissolved salts and nutrients. These findings align with those of Musungu et al. (2023)44, who identified a correlation between high EC values and runoff from tea estates, subsistence farming, and commercial agriculture, particularly during the wet season. Studies near Kampala and Lake Victoria have documented peaks in nitrate levels associated with fertiliser application, particularly the NPK 17:17:17 formulation, and runoff from urban agriculture45. Similarly, Njue et al. (2022)46 found that significant EC spikes downstream of irrigation schemes in the Thiba River basin correlated with the use of inorganic fertilisers, highlighting the impact of agricultural practices on water quality. The high EC values observed during the dry season are due to concentration effects resulting from lower flow rates, whereas the wet season results in dilution. Likewise, Uwimana et al.(2017)47 reported higher EC during dry periods, linking this to farming activities, especially in rice and vegetable cultivation areas in the Migina Catchment, Rwanda. Saturday et al. (2023)27 also observed similar results in Lake Mulehe, which they attributed to reduced water flow and increased concentrations of dissolved ions during the dry months.Total Suspended Solids (TSS) levels varied considerably across the study stations, with Maziba Dam and Katuna exhibiting the highest TSS levels, which far exceeded the usual threshold of 25 to 80 mg/L for healthy freshwater ecosystems48. The hilly terrain and poor agricultural practices, including limited vegetation cover, worsen erosion and lead to increased sedimentation in the study area. For instance, high TSS levels can damage benthic habitats and reduce light penetration, hindering photosynthesis in aquatic organisms. Furthermore, elevated TSS can clog fish gills, impairing their ability to breathe and feed. At the Muvumba hydropower dam, sediment accumulation reduces reservoir capacity and impacts hydropower efficiency. The long-term sustainability of this dam may be at risk unless effective measures, such as dredging or upstream restoration, are adopted. The high TSS values observed during the rainy season also indicate ongoing sediment influx due to urban expansion and unpaved roads, which significantly contribute to sedimentation in rivers and streams during rainfalls. In the River Rwizi sub-catchment, Ojok et al. (2017)49 noted elevated TSS levels during the rainy season, attributed to soil erosion from agricultural activities and urban runoff, which increases sediment loading in the river. Additionally, increased K+ levels are often linked to human waste, as inadequate sanitation practices in Uganda’s peri-urban markets lead to higher potassium concentrations in nearby water bodies.The nutrient dynamics exhibited complex patterns influenced by hydrological processes and localised nutrient inputs. The highest concentrations of NO₃⁻-N at Brazin Forest and Hakakondogoro were below the WHO limit of 50 mg/L for NO₃⁻ levels34. However, this suggests nutrient contributions from untreated sewage and runoff resulting from wastewater channels from nearby homes and agricultural fields that drain into the river system. Elevated NH₄⁺-N levels at Hakakondogoro station may indicate a substantial increase in organic matter from neighbouring animal farms in adjacent sub-watersheds or inadequate nitrification processes due to ineffective ammonium absorption. High levels of SRP, TP, and TN at Maziba Dam and Lower Bugongi stations are associated with household wastewater and agricultural runoff, indicating nutrient enrichment that could lead to algal blooms and eutrophication, thereby negatively impacting water quality and aquatic life. These findings align with Valeriani et al. (2015)50, who identified agricultural activities and residential discharges as primary sources of phosphorus in river ecosystems.Significant variations in nutrient concentrations and seasonal patterns were observed, driven by hydrological and biogeochemical processes. These findings emphasise that transport pathways and biogeochemical processing of nutrient forms can differ notably between wet and dry seasons, affecting water quality management. In December 2023, SRP, NO₃⁻-N, and TP reached their highest levels, while TN peaked in February 2024. These peaks coincided with months of lower flow conditions, typical of the dry season, which likely concentrate both point and non-point nutrient inputs. Saturday et al. (2023)27 reported high nutrient levels, particularly SRP, TP, and TN, during the wet season, due to agricultural runoff from the surrounding area in the Lake Mulehe sub-catchment. Furthermore, similar nutrient fluctuations were recorded along the Ugandan stretch of the Kagera Transboundary River, where increased agricultural runoff was linked to elevated NO₃⁻-N levels, underscoring the significant impact of land-use practices on nutrient dynamics in the river system. This variability in nutrient concentrations highlights the need for effective management strategies to mitigate eutrophication risks during periods of nutrient enrichment.Correlation analysis of water quality parametersPrincipal Component Analysis (PCA) has successfully identified six principal components (PCs) that explain 72.1% of the variance in water quality metrics. The first principal component (PC1) shows a strong correlation with salinity and nutrient pollution. High scores on PC1 indicate increased levels of EC, TDS, and specific ion concentrations (notably Na+ and K+), as well as higher concentrations of nitrate nitrogen and total nitrogen. Research suggests that in Uganda’s freshwater systems, conductivity values in pristine rivers and lakes generally range from 100 to 500 µS/cm38. In contrast, Ling et al. (2017)51 identified six components accounting for 83.6% of the variance in their analysis, with primary pollutants linked to logging activities being total suspended solids, turbidity, and hydrogen sulfide. Their second component was associated with domestic pollution indicators, such as biochemical oxygen demand and phosphorus levels.The Spearman correlation analysis revealed significant relationships among various water quality parameters. Temperature showed a significant negative correlation with DO levels, demonstrating that oxygen solubility decreases as temperatures rise. This pattern is supported by Szewczyk et al. (2023)52, who observed that higher water temperatures in the Waccamaw River in the Pee Dee Basin of the Southeastern United States led to lower DO levels. Additionally, a notable negative correlation was found between TSS and Chl-a concentrations. This suggests that turbidity levels increase as suspended sediments from surface runoff accumulate. The increased turbidity likely reduces light penetration in the water, thereby hampering photosynthesis and primary productivity. These results emphasise the complex interconnection of physical, chemical, and biological processes, all of which play a vital role in affecting water quality in our study area. Phosphorus species, including SRP and TP, correlated strongly with pH, EC, TDS, and major ions such as Na⁺, K⁺, and Cl⁻. These findings indicate that agricultural and municipal runoff are the main sources of phosphorus. This enrichment probably results from the use of fertilisers and urban effluents, which increase ion concentrations and phosphates. Nitrogen species, such as NO₂⁻-N, NH₄⁺-N, and TN, showed strong positive correlations with turbidity, chlorophyll-a, EC, and TDS. This reflects the impact of nutrient enrichment from fertilisers and organic matter on algal growth and particulate matter. Similarly, Umuhoza et al. (2024)53 reported that TN and TP were strongly associated with turbidity, EC, and TDS in the Nyabarongo River of Rwanda.While anthropogenic activities are the primary drivers of water quality degradation in the Maziba catchment, suspended sediments may contribute to pollutant mobilization from underlying geology. Elevated TSS concentrations during the wet season indicate substantial erosion, with suspended particles adsorbing and transporting geogenic contaminants such as heavy metals and trace elements from the metamorphic and sedimentary rocks of southwestern Uganda.Water quality indices and their implicationsThe Weighted Arithmetic Water Quality Index (WAWQI) indicated that most stations (69%) had water quality unsuitable for drinking, with classifications ranging from “poor” to “unfit.” A notable contrast was observed in headwater catchments: while four such stations exhibited “excellent” or “good” quality, reflecting cleaner conditions, three others were considered “unfit.” This significant degradation is attributed to specific anthropogenic pressures, namely urban influences from Kabale town affecting Lower Bugongi (M8) and Brazin Forest (M9) stations, and intensive agriculture impacting Hakakondogoro station (M1).The WAWQI generally shows significant temporal and spatial differences in drinking water quality across various study locations. These differences highlight the important influence of local factors, especially urban wastewater discharges and agricultural runoff, on the physicochemical properties of water. For example, Sanusi et al. (2024)54 found that during the wet season in the Kampala/Mbarara region, the proportion of sampling sites classified as “excellent” decreased sharply, with some sites deemed unsuitable for consumption due to high pollutant levels. Similarly, Saturday et al.(2021)26 reported that the lower catchment areas of Lake Bunyonyi consistently had lower WQI scores compared to upstream sites. Their findings showed that wastewater discharge from trading centres and agricultural runoff increased nutrient and contaminant levels in downstream locations26. These observations demonstrate that urban and peri-urban areas, along with downstream sites, often show higher pollution indices, emphasising the usefulness of WAWQI in capturing spatial differences and seasonal changes in tropical watersheds54.The variations in water quality are further affected by urban stormwater runoff and rainfall-induced discharge. Intense tropical rainfall tends to mobilise nutrients and pollutants from land surfaces into nearby water bodies. For example, satellite monitoring over Lake Victoria has shown that increased rainfall correlates with greater erosion and nutrient transport (including nitrogen and phosphorus) into the lake, promoting algal blooms and increased turbidity45. Besides the impacts of natural rainfall, urbanisation and agricultural expansion worsen these issues. Urban runoff, along with poor sanitation, often introduces untreated sewage and agrochemicals into water bodies, while farming practices contribute to the runoff of fertilisers and pesticides. Research indicates that runoff from suburban Kampala during rainy periods, along with discharges from markets, leads to rapid increases in WQI values54. Interestingly, our data and the WAWQI display a slight decline in water quality during the rainy season. However, the differences are not statistically significant.The CPI evaluated water suitability for aquatic life, showing that it was, on average, “slightly impacted” across the study stations. Seasonal analysis revealed greater variability, with results ranging from “minimal impact” to “moderately impacted”. These findings indicate that, despite human pressures such as farming activities, the ecological health of aquatic environments generally remains relatively stable. Similarly, Mekonnen and Tekeba (2024) observed that the Shinta River in Ethiopia, affected by both brewery and municipal waste, was rated as merely “minimal impacted” (CPI around 0.2–0.4) in its upper reaches but showed moderate to severe pollution levels (CPI approximately 0.8–5.0) downstream, affecting drinking water and aquatic life8. Seasonal fluctuations amplify these patterns: during the rainy season, runoff transports nutrients and organic materials into rivers, worsening pollution metrics. For example, Chen et al. (2022)55 discovered that during the wet season in Mwanza Gulf (Lake Victoria), agricultural runoff and urban effluents led to elevated nitrogen and phosphorus levels, resulting in the lowest water quality index (WQI) scores.Overall, the results of this study agree with findings from both tropical and global research, showing that human activities (such as urban wastewater and fertilisers) cause higher WAWQI/CPI scores, which indicate poorer water quality, especially during heavy rain. For example, Podlasek et al. (2025)56 reported average WQI values between 63 and 97 and CPI values from 0.56 to 0.88 in surface waters affected by landfills, which usually signal good water quality. Combining index methods effectively monitors changes in water quality over time and across locations, highlighting the need to control pollution from urban and agricultural sources to improve index ratings and protect ecosystems.The integrated risk assessment offers a holistic insight into the nature of environmental pressure within the Maziba catchment. The complete absence of any “low-risk” stations underscores that the entire monitored system is affected by water quality degradation. The strong linkage observed between the CPI and WAWQI values, with a majority of stations classified as “High Risk” (50.0%) or “Severe Risk” (18.8%), indicates that the sources of pollution are not specialised. Instead, they appear to be degrading the water from both ecological and human-use perspectives simultaneously, a characteristic of mixed-contaminant sources, namely agricultural and urban runoff. No stations are classified as “Moderate Risk for drinking”. This finding suggests that in the Maziba system, ecological health is not a standalone early warning indicator that degrades long before drinking water suitability is compromised. The environmental pressures are such that by the time ecological indicators decline, parameters relevant to human consumption are already, or are simultaneously, impacted. This concurrent degradation implies that management strategies must be integrated to address the issue effectively. Interventions aimed solely at ecosystem restoration or at improving drinking water quality in isolation may be inefficient; instead, actions that reduce agricultural runoff, manage urban wastewater, and control erosion are required to improve the overall health of the watershed and safeguard both ecosystem functions and public health.ConclusionThis study provides a detailed evaluation of water quality across sixteen watersheds in the Maziba sub-catchment, using monthly monitoring data from July 2023 to June 2024 to examine spatiotemporal variations. The findings reveal significant ionic and nutrient enrichment, with Principal Component Analysis indicating that six components account for 72.1% of the total variance. The first component confirms that pollution is primarily driven by salinity and nutrient indicators (EC, TDS, Na⁺, Cl⁻, K⁺, NO₃⁻-N), which are linked to agricultural runoff and urban wastewater, especially during the rainy seasons. The application of the Weighted Arithmetic Water Quality Index (WAWQI) and the Comprehensive Pollution Index (CPI) simplifies complex data into understandable values, facilitating communication of water quality to stakeholders. The assessment highlights a critical situation in Maziba: the WAWQI classifies 69% of the stations as having water that is “poor” to “unfit” for drinking. The combined risk assessment corroborates this, categorising most sites as “High Risk” (50.0%) or “Severe Risk” (18.8%), with no stations falling into the “Low Risk” category. The framework reveals a concurrent decline in drinking water quality and ecosystem health, with the direct implication for water management being that interventions must be integrated. Strategies should comprehensively address both diffuse agricultural runoff and urban pollution to improve the catchment’s condition. Furthermore, this study addresses the regional issue of data scarcity by making the complete dataset freely accessible to support future research and monitoring efforts.This study has limitations affecting the interpretation of results. Excluding microbiological indicators, such as E. coli, prevents the assessment of faecal contamination and associated health risks. The focus on physicochemical parameters may overlook contaminants such as pesticides or emerging pollutants. A one-year monitoring period limits understanding of long-term trends and seasonal patterns. Results are from specific sites in the Kigezi Highlands and may not represent wider conditions due to site-specific factors and hydrological variability. The WAWQI/CPI and risk assessment methods simplify complex data, potentially obscuring individual pollutant effects. Future research should include microbiological tests, longer monitoring periods, and broader spatial coverage to assess water quality in tropical agricultural catchments better.

    Data availability

    Data is published online available at https://doi.org/10.5281/zenodo.15465720.
    ReferencesBadrzadeh, N., Samani, J. M. V., Mazaheri, M. & Kuriqi, A. Evaluation of management practices on agricultural nonpoint source pollution discharges into the rivers under climate change effects. Sci. Total Environ. 838, 156643 (2022).Article 
    CAS 
    PubMed 

    Google Scholar 
    Mwanake, H. et al. Agricultural practices and soil and water conservation in the transboundary region of Kenya and uganda: farmers’ perspectives of current soil erosion. Agriculture 13, 1434 (2023).Article 

    Google Scholar 
    Mekonnen, Y. A., Tenagashawu, D. Y. & Tekeba, H. M. Evaluation of the physicochemical and Microbiological current water quality status of ribb reservoir, South Gondar, Ethiopia. Sustain. Water Resour. Manag. 9, 18 (2023).Article 

    Google Scholar 
    Ndhlovu, G. Z. & Woyessa, Y. E. Evaluation of streamflow under climate change in the Zambezi river basin of Southern Africa. Water 13, 3114 (2021).Article 

    Google Scholar 
    Tenagashaw, D. Y., Muluneh, M., Metaferia, G. & Mekonnen, Y. A. Land use and climate change impacts on streamflow using SWAT Model, middle Awash sub Basin, Ethiopia. Water Conserv. Sci. Eng. 7, 183–196 (2022).Article 

    Google Scholar 
    Aminiyan, M. M. & Aminiyan, F. M. Comprehensive integrated index–based geochemistry and hydrochemical analyses of groundwater resources for multiple consumptions under coastal conditions. Environ. Sci. Pollut Res. 27, 21386–21406 (2020).Article 
    CAS 

    Google Scholar 
    Lin, L., Yang, H. & Xu, X. Effects of water pollution on human health and disease heterogeneity: a review. Front. Environ. Sci. 10, 880246 (2022).Article 

    Google Scholar 
    Mekonnen, Y. A. & Tekeba, H. M. Analysis of water quality by comprehensive pollution index (CPI) and self-purification capacity of Shinta River, Ethiopia. Sustain. Water Resour. Manag. 10, 10 (2024).Article 

    Google Scholar 
    Ferronato, N. & Torretta, V. Waste mismanagement in developing countries: A review of global issues. Int. J. Environ. Res. Public. Health. 16, 1060 (2019).Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    Galaburda, M. et al. Physicochemical and sorption characteristics of carbon biochars based on lignin and industrial waste magnetic iron dust. Water 15, 189 (2023).Article 
    CAS 

    Google Scholar 
    Zubaidah, T., Karnaningroem, N. & Slamet, A. The self-purification ability in the rivers of Banjarmasin, Indonesia. J Ecol. Eng 20, 177-182 (2019).Mosavi, S. S., Zare, E. N., Behniafar, H. & Tajbakhsh, M. Removal of amoxicillin antibiotic from polluted water by a magnetic Bionanocomposite based on carboxymethyl tragacanth gum-grafted-polyaniline. Water 15, 202 (2023).Article 
    CAS 

    Google Scholar 
    Feng, Z. et al. Analysis of water quality indexes and their relationships with vegetation using self-organizing map and geographically and temporally weighted regression. Environ. Res. 216, 114587 (2023).Article 
    CAS 
    PubMed 

    Google Scholar 
    Gantayat, R. R., Viswanathan, P. M., Ramasamy, N. & Sabarathinam, C. Distribution and fractionation of metals in tropical estuarine sediments, NW borneo: implication for ecological risk assessment. J. Geochem. Explor. 252, 107253 (2023).Article 
    CAS 

    Google Scholar 
    Bordalo, A. A., Teixeira, R. & Wiebe, W. J. A water quality index applied to an international shared river basin: the case of the douro river. Environ. Manage. 38, 910–920 (2006).Article 
    ADS 
    PubMed 

    Google Scholar 
    Horton, R. K. An index number system for rating water quality. J. Water Pollut Control Fed. 37, 300–306 (1965).
    Google Scholar 
    Brown, R. M., McClelland, N. I., Deininger, R. A. & Tozer, R. G. A water quality index-do we Dare. Water Sew. Works 117, 339-343 (1970).Nivesh, S. et al. Assessment of future water demand and supply using WEAP model in Dhasan river Basin, Madhya Pradesh, India. Environ. Sci. Pollut Res. 30, 27289–27302 (2022).Article 

    Google Scholar 
    Roy, S. et al. Assessing and modelling drinking water quality at the railway stations of Tripura, India, with a possible strategic solution. Arab. J. Geosci. 16, 98 (2023).Article 
    CAS 

    Google Scholar 
    Elsayed, S. et al. Application of irrigation water quality indices and multivariate statistical techniques for surface water quality assessments in the Northern nile Delta, Egypt. Water 12, 3300 (2020).Article 
    CAS 

    Google Scholar 
    Xiang, J. et al. Applying multivariate statistics for identification of groundwater chemistry and qualities in the Sugan lake Basin, Northern Qinghai-Tibet Plateau, China. J. Mt. Sci. 17, 448–463 (2020).Article 

    Google Scholar 
    Xu, H., Gao, Q. & Yuan, B. Analysis and identification of pollution sources of comprehensive river water quality: evidence from two river basins in China. Ecol. Indic. 135, 108561 (2022).Article 
    CAS 

    Google Scholar 
    Alex, S. & Johnson, R. Analysis of bacteriological quality of domestic water sources in Kabale municipality. Western Uganda. 11, 581-594 (2019).Namatovu, H. K., Magumba, M. A. & Oyana, T. J. A water quality dataset of levels of metal, nutrient and anions in sample water points from sixteen selected urban and rural districts of Uganda. Data Brief. 50, 109601 (2023).Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    Nimusiima, D., Byamugisha, D., Omara, T. & Ntambi, E. Physicochemical and microbial quality of water from the Ugandan stretch of the Kagera transboundary river. Limnol. Rev. 23, 157–176 (2023).Article 
    CAS 

    Google Scholar 
    Saturday, A., Lyimo, T. J., Machiwa, J. & Pamba, S. Spatio-temporal variations in physicochemical water quality parameters of lake Bunyonyi, Southwestern Uganda. SN Appl. Sci. 3, 684 (2021).Article 
    CAS 

    Google Scholar 
    Saturday, A., Kangume, S. & Bamwerinde, W. Content and dynamics of nutrients in the surface water of shallow lake Mulehe in Kisoro District, South–western Uganda. Appl. Water Sci. 13, 150 (2023).Article 
    ADS 

    Google Scholar 
    Kachroud, M., Trolard, F., Kefi, M., Jebari, S. & Bourrié, G. Water quality indices: challenges and application limits in the literature. Water 11, 361 (2019).Article 
    CAS 

    Google Scholar 
    Kükrer, S. & Mutlu, E. Assessment of surface water quality using water quality index and multivariate statistical analyses in Saraydüzü dam Lake, Turkey. Environ. Monit. Assess. 191, 71 (2019).Article 
    PubMed 

    Google Scholar 
    Wang, J., Fu, Z., Qiao, H. & Liu, F. Assessment of eutrophication and water quality in the estuarine area of lake Wuli, lake Taihu, China. Sci. Total Environ. 650, 1392–1402 (2019).Article 
    ADS 
    CAS 
    PubMed 

    Google Scholar 
    Saturday, A., Kangume, S., Stecher, G. & Herrnegger, M. Surface Water Quality Data Maziba Catchment, Western Uganda. Zenodo https://doi.org/10.5281/zenodo.15465720 (2025).APHA. Standard Methods for the Examination of Water and Wastewater. American Public Health Association, (2023).Brown, R. M., McClelland, N. I., Deininger, R. A. & O’Connor, M. F. A water quality Index — Crashing the psychological barrier. In Indicators of Environmental Quality (ed. Thomas, W. A.) 173–182 (Springer US, 1972). https://doi.org/10.1007/978-1-4684-2856-8_15.Chapter 

    Google Scholar 
    WHO. Guidelines for drinking-water quality, 4th edition, incorporating the 1stthe first and second addenda. (2022).UNBS. Uganda standards Template – World Trade Organization. in. Government of Uganda, (2014).Imneisi, I. & Aydin, M. Water quality assessment for Elmali stream and Karaçomak stream using the comprehensive pollution index (CPI) in Karaçomak Watershed, Kastamonu, Turkey. Fresenius Environ. Bull. 27, 7031–7038 (2018).CAS 

    Google Scholar 
    WHO. A global overview of National regulations and standards for drinking-water quality. World Health Organ (2018).Bwire, G. et al. The quality of drinking and domestic water from the surface water sources (lakes, rivers, irrigation canals and ponds) and springs in cholera prone communities of uganda: an analysis of vital physicochemical parameters. BMC Public. Health. 20, 1128 (2020).Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    Kalny, G. et al. The influence of riparian vegetation shading on water temperature during low flow conditions in a medium sized river. Knowl Manag Aquat. Ecosyst. 5, 418 (2017).Larance, S., Wang, J., Delavar, M. A. & Fahs, M. Assessing water temperature and dissolved oxygen and their potential effects on aquatic ecosystem using a SARIMA model. Environments 12, 25 (2025).Article 

    Google Scholar 
    Abdul-Aziz, O. I. & Gebreslase, A. K. Emergent Scaling of Dissolved Oxygen (DO) in Freshwater Streams Across Contiguous USA. Water Resour. Res. 59, eWR032114 (2023). (2022).Ibrahim, A. & Abdulkarim, B. Studies on the Physico-Chemical parameters and zooplankton composition of Ajiwa reservoir Katsina State, Nigeria. UMYU J. Microbiol. Res. 2, 1–8 (2017).Article 

    Google Scholar 
    Sahani, M. K. Impact of climate change and human actions on the turbidity of water from wetlands in Rukiga district in uganda: implications for health and community involvement. Discov Environ. 2, 133 (2024).Article 

    Google Scholar 
    Musungu, P. C., Kengara, F. O., Ongeri, D. M. K., Abdullah, M. M. S. & Ravindran, B. Influence of agricultural activities and seasonality on levels of selected physico-chemical parameters and heavy metals along river Yala in lake Victoria basin. Environ. Monit. Assess. 195, 1467 (2023).Article 
    CAS 
    PubMed 

    Google Scholar 
    Nalumenya, B. et al. Assessing the potential impacts of contaminants on the water quality of lake victoria: two case studies in Uganda. Sustainability 16, 9128 (2024).Article 
    ADS 
    CAS 

    Google Scholar 
    Njue, J. M., Magana, A. M. & Githae, E. W. Effects of agricultural nutrients influx on water quality in Thiba river basin, a sub-catchment of Tana river basin in Kirinyaga County, Kenya. East. Afr. J. Agric. Biotechnol. 5, 69–89 (2022).Article 

    Google Scholar 
    Uwimana, A., Van Dam, A., Gettel, G., Bigirimana, B. & Irvine, K. Effects of river discharge and land use and land cover (LULC) on water quality dynamics in migina Catchment, Rwanda. Environ. Manage. 60, 496–512 (2017).Article 
    ADS 
    PubMed 

    Google Scholar 
    CCME. Canadian water quality guidelines for the protection of aquatic life. User’s Man (2001).Ojok, W., Wasswa, J. & Ntambi, E. Assessment of seasonal variation in water quality in river Rwizi using multivariate statistical techniques, Mbarara Municipality, Uganda. J. Water Resour. Prot. 9, 83 (2017).Article 
    CAS 

    Google Scholar 
    Valeriani, F., Zinnà, L., Vitali, M., Romano Spica, V. & Protano, C. River water quality assessment: comparison between old and new indices in a real scenario from Italy. Int. J. River Basin Manag. 13, 325–331 (2015).Article 

    Google Scholar 
    Ling, T. Y. et al. Application of Multivariate Statistical Analysis in Evaluation of Surface River Water Quality of a Tropical River. J. Chem. 1–13 (2017). (2017).Szewczyk, C. J., Smith, E. M. & Benitez-Nelson, C. R. Temperature sensitivity of oxygen demand varies as a function of organic matter source. Front. Mar. Sci. 10, 1133336 (2023).Article 

    Google Scholar 
    Umuhoza, M., Niu, D. & Li, F. Exploring seasonal variability in water quality of Nyabarongo river in Rwanda via water quality indices and DPSIR modelling. Environ. Sci. Pollut Res. 31, 44329–44347 (2024).Article 
    CAS 

    Google Scholar 
    Sanusi, I. O., Olutona, G. O., Wawata, I. G. & Onohuean, H. Analysis of groundwater and surface water quality using modelled water index and multivariate statistics in Kampala and Mbarara Districts, Uganda. Discov Environ. 2, 125 (2024).Article 

    Google Scholar 
    Chen, S. S., Kimirei, I. A., Yu, C., Shen, Q. & Gao, Q. Assessment of urban river water pollution with urbanization in East Africa. Environ. Sci. Pollut Res. 29, 40812–40825 (2022).Article 
    CAS 

    Google Scholar 
    Podlasek, A., Koda, E., Kwas, A., Vaverková, M. D. & Jakimiuk, A. Anthropogenic and natural impacts on surface water quality: the consequences and challenges at the nexus of waste management Facilities, industrial Zones, and protected areas. Water Resour. Manag. 39, 1697–1718 (2025).Article 

    Google Scholar 
    Download referencesFundingThis research is supported by the Africa-UniNet project P082 (Land Use and Land Cover Change Effects on Water Quality Characteristics of the Maziba Sub-Catchment, Western Uganda). Africa-UniNet is financed by the Austrian Federal Ministry of Education, Science and Research and administered by OeAD-GmbH/Austria‘s Agency for Education and Internationalisation.Author informationAuthors and AffiliationsFaculty of Agriculture and Environmental Sciences, Kabale University, Kabale, UgandaMathew Herrnegger & Gabriel StecherInstitute of Hydrology and Water Management, Department of Landscape, Water and Infrastructure, BOKU University, Muthgasse 18, Vienna, 1190, AustriaAlex Saturday & Susan KangumeAuthorsAlex SaturdayView author publicationsSearch author on:PubMed Google ScholarMathew HerrneggerView author publicationsSearch author on:PubMed Google ScholarSusan KangumeView author publicationsSearch author on:PubMed Google ScholarGabriel StecherView author publicationsSearch author on:PubMed Google ScholarContributionsAlex Saturday collected data, drafted the manuscript text, and prepared the tables; Mathew Herrnegger drafted the manuscript and prepared the figures; Susan Kangume collected data and drafted the manuscript; Gabriel Stecher drafted the manuscript. All authors reviewed the manuscript.Corresponding authorCorrespondence to
    Alex Saturday.Ethics declarations

    Competing interests
    The authors declare no competing interests.

    Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.AppendicesAppendicesAppendix A1: Mean ± SD of physicochemical parameters at different study stations (n= 192)Station IDStationWT (°C)DO (mg/L)Tur (NTU)EC (µS/cm)TSS (mg/L)Chl-a (µg/L)M1Hakakondogoro19.48 ± 3.244.36 ± 2.0355.70 ± 51.78233.75 ± 40.96103.67 ± 88.465.75 ± 2.51M2Kakore East19.37 ± 1.784.53 ± 1.6772.49 ± 67.38233.25 ± 36.35137.25 ± 14.086.10 ± 2.48M3Mukirwa19.91 ± 3.946.47 ± 1.8930.50 ± 19.70214.33 ± 72.29145.67 ± 219.352.88 ± 1.14M4Ihanga FGC19.23 ± 2.195.55 ± 1.1080.22 ± 76.14212.92 ± 72.03182.25 ± 108.276.55 ± 2.85M5Ihanga19.33 ± 2.046.58 ± 0.7917.44 ± 7.22162.83 ± 43.2692.42 ± 87.362.19 ± 0.98M6Ihanga West19.17 ± 3.316.91 ± 1.1517.84 ± 11.34188.75 ± 61.2752.42 ± 30.991.95 ± 0.63M7Rwakaraba19.71 ± 2.246.08 ± 1.1175.64 ± 68.05220.92 ± 72.65175.33 ± 174.085.70 ± 2.19M8Lower Bugongi20.57 ± 1.904.31 ± 1.4427.01 ± 15.24246.17 ± 88.1083.83 ± 38.023.48 ± 1.26M9Brazin Forest19.98 ± 1.415.48 ± 1.3026.98 ± 13.07206.00 ± 48.4370.58 ± 21.233.07 ± 0.78M10Rugyendira20.63 ± 1.575.48 ± 0.9477.74 ± 55.71156.83 ± 59.90198.75 ± 158.015.58 ± 1.61M11Muvumba21.43 ± 2.325.65 ± 1.0479.34 ± 67.03155.83 ± 56.60223.58 ± 297.334.64 ± 2.19M12Katuna22.15 ± 2.655.15 ± 1.8951.09 ± 43.12198.00 ± 79.60287.75 ± 288.543.80 ± 1.77M13Butobere18.27 ± 1.435.59 ± 0.7228.83 ± 8.48262.25 ± 74.99118.00 ± 72.163.10 ± 0.55M14Kyanamira19.82 ± 1.425.88 ± 1.1678.01 ± 61.80177.58 ± 34.02260.67 ± 250.615.91 ± 1.14M15Kabanyonyi19.15 ± 2.666.07 ± 1.1051.44 ± 85.9473.50 ± 23.7290.75 ± 32.532.86 ± 0.52M16Maziba Dam20.95 ± 2.605.48 ± 1.4782.38 ± 66.69163.50 ± 38.51298.75 ± 343.815.75 ± 1.45Appendix A2: Mean ± SD of physicochemical parameters at different study stations (n= 192) (Cont’d)Station IDStationsPHEC (µS/cm)TDS (mg/L)TH (mg/L)Na+ (mg/L)Cl (mg/L)SO4 (mg/L)K+(mg/L)M1Hakakondogoro6.84 ± 0.30233.75 ± 40.96163.58 ± 28.6991.17 ± 12.1111.00 ± 3.4620.90 ± 8.6629.47 ± 16.233.98 ± 1.23M2Kakore East6.97 ± 0.18233.25 ± 36.35163.29 ± 25.4888.75 ± 15.9610.89 ± 1.3921.74 ± 4.6528.59 ± 11.434.03 ± 1.22M3Mukirwa7.28 ± 0.25214.33 ± 72.29150.04 ± 50.6082 ± 15.109.93 ± 3.2124.34 ± 10.8235.08 ± 12.613.25 ± 0.91M4Ihanga FGC6.92 ± 0.24212.92 ± 72.03149.05 ± 50.4389.17 ± 14.679.66 ± 2.9019.57 ± 6.6931.19 ± 15.243.76 ± 1.28M5Ihanga6.99 ± 0.30162.83 ± 43.26113.88 ± 30.2367.58 ± 11.186.86 ± 1.5217.18 ± 6.1327.64 ± 13.302.49 ± 0.28M6Ihanga West7.30 ± 0.30188.75 ± 61.27132.10 ± 42.8876.08 ± 9.738.43 ± 2.6423.49 ± 10.3429.99 ± 11.253.29 ± 0.46M7Rwakaraba7.06 ± 0.15220.92 ± 72.65154.6 ± 50.8286.25 ± 16.2210.53 ± 3.2720.65 ± 8.0337.00 ± 23.383.49 ± 1.20M8Lower Bugongi6.80 ± 0.24246.17 ± 88.10172.28 ± 61.6573.67 ± 12.9214.60 ± 7.6926.94 ± 10.9726.05 ± 6.567.46 ± 4.43M9Brazin Forest6.96 ± 0.14206.00 ± 48.43152.87 ± 11.4969.25 ± 19.1310.82 ± 1.3821.39 ± 4.3025.69 ± 6.784.45 ± 0.79M10Rugyendira6.85 ± 0.20156.83 ± 59.90109.77 ± 41.9364.83 ± 16.847.07 ± 2.8714.46 ± 5.8319.04 ± 9.473.32 ± 1.99M11Muvumba6.93 ± 0.17155.83 ± 56.60100.46 ± 41.9964.50 ± 21.007.65 ± 4.5317.48 ± 9.6818.12 ± 6.432.88 ± 1.23M12Katuna6.80 ± 0.53198.00 ± 79.60138.60 ± 55.7260.08 ± 26.2710.68 ± 4.8222.91 ± 10.0621.65 ± 8.053.85 ± 1.49M13Butobere6.94 ± 0.22262.25 ± 74.99183.54 ± 52.4782.00 ± 11.0915.22 ± 5.0631.73 ± 12.7134.38 ± 9.514.98 ± 1.67M14Kyanamira6.91 ± 0.36177.58 ± 34.02124.25 ± 23.7769.75 ± 13.779.22 ± 2.4418.68 ± 3.9325.57 ± 11.283.84 ± 1.59M15Kabanyonyi6.87 ± 0.2973.50 ± 23.7251.44 ± 16.6053.58 ± 19.342.68 ± 0.534.37 ± 1.835.18 ± 3.061.51 ± 0.29M16Maziba Dam7.19 ± 0.37163.50 ± 38.51114.43 ± 26.9371.08 ± 6.177.58 ± 1.6716.89 ± 4.8623.60 ± 10.523.08 ± 0.57Appendix A3: Mean ± SD of physicochemical parameters at different study stations (Cont’d)Station IDStationsTP (mg/L)TN (mg/L)SRP (mg/L)NO₂⁻-N (mg/L)NH₄⁺-N(mg/L)NO₃⁻-N (mg/L)M1Hakakondogoro0.30 ± 0.263.62 ± 1.110.08 ± 0.050.12 ± 0.200.41 ± 0.321.91 ± 0.74M2Kakore East0.34 ± 0.283.66 ± 0.660.11 ± 0.100.09 ± 0.050.22 ± 0.242.16 ± 0.70M3Mukirwa0.21 ± 0.143.68 ± 0.740.06 ± 0.050.01 ± 0.010.10 ± 0.072.68 ± 0.90M4Ihanga FGC0.26 ± 0.213.39 ± 0.900.08 ± 0.080.04 ± 0.030.13 ± 0.172.16 ± 0.79M5Ihanga0.18 ± 0.163.43 ± 0.710.05 ± 0.070.02 ± 0.010.10 ± 0.062.60 ± 0.77M6Ihanga West0.16 ± 0.193.60 ± 1.070.05 ± 0.030.01 ± 0.020.07 ± 0.042.79 ± 1.11M7Rwakaraba0.23 ± 0.213.13 ± 0.840.07 ± 0.050.03 ± 0.030.11 ± 0.102.12 ± 0.67M8Lower Bugongi0.23 ± 0.175.03 ± 1.710.11 ± 0.080.11 ± 0.120.32 ± 0.253.12 ± 1.14M9Brazin Forest0.28 ± 0.334.91 ± 0.870.11 ± 0.190.08 ± 0.090.30 ± 0.223.46 ± 0.78M10Rugyendira0.28 ± 0.273.78 ± 0.760.06 ± 0.060.11 ± 0.270.17 ± 0.122.59 ± 0.38M11Muvumba0.19 ± 0.163.93 ± 0.930.06 ± 0.050.07 ± 0.110.17 ± 0.112.64 ± 0.67M12Katuna0.21 ± 0.223.99 ± 1.770.06 ± 0.050.04 ± 0.030.16 ± 0.142.51 ± 1.57M13Butobere0.12 ± 0.094.33 ± 1.120.03 ± 0.020.02 ± 0.020.13 ± 0.163.01 ± 1.31M14Kyanamira0.40 ± 0.463.94 ± 1.400.08 ± 0.080.05 ± 0.050.18 ± 0.142.18 ± 1.10M15Kabanyonyi0.11 ± 0.082.95 ± 0.880.04 ± 0.030.04 ± 0.070.11 ± 0.081.93 ± 0.75M16Maziba Dam0.40 ± 0.574.52 ± 3.470.08 ± 0.080.07 ± 0.120.13 ± 0.072.33 ± 0.96Appendix A4: Mean ± SD of physicochemical parameters across sampling months (n= 192)MonthsWT (°C)DO (mg/L)Turbidity (NTU)Chl-a (µg/L)pHEC (µS/cm)TDS (mg/L)Aug 202317.87 ± 1.897.41 ± 0.7226.04 ± 12.843.00 ± 1.207.38 ± 0.21199.75 ± 61.82139.83 ± 43.27Sept 202319.10 ± 2.666.26 ± 1.8843.12 ± 22.534.72 ± 1.747.04 ± 0.26225.69 ± 74.65157.88 ± 52.22Oct 202321.20 ± 2.996.05 ± 0.7869.82 ± 61.294.74 ± 2.926.95 ± 0.24128.31 ± 75.7389.80 ± 53.00Nov 202319.23 ± 2.495.98 ± 1.20106.79 ± 74.485.39 ± 2.656.87 ± 0.19221.06 ± 66.97154.74 ± 46.88Dec 202319.22 ± 2.445.89 ± 1.21107.78 ± 74.845.94 ± 2.656.87 ± 0.19102.75 ± 57.1671.93 ± 40.01Jan 202418.97 ± 0.815.30 ± 1.7958.53 ± 61.544.79 ± 2.357.21 ± 0.21176.38 ± 48.00123.44 ± 33.58Feb 202419.75 ± 2.565.23 ± 1.0978.81 ± 50.186.16 ± 2.376.89 ± 0.55234.75 ± 56.92164.33 ± 39.85Mar 202421.44 ± 2.175.04 ± 1.1033.09 ± 17.064.09 ± 1.437.07 ± 0.27177.00 ± 69.99123.90 ± 48.99Apr 202420.87 ± 1.554.89 ± 1.2327.32 ± 9.023.91 ± 1.546.91 ± 0.23203.06 ± 52.93142.14 ± 7.05May 202421.36 ± 1.864.75 ± 1.1856.07 ± 72.654.48 ± 1.766.84 ± 0.19228.81 ± 62.11160.17 ± 43.48June 202419.53 ± 2.295.37 ± 1.3319.80 ± 5.692.83 ± 0.806.88 ± 0.23216.06 ± 55.75151.18 ± 39.07Jul 202420.71 ± 3.024.91 ± 1.9013.33 ± 5.722.50 ± 1.106.83 ± 0.30216.19 ± 58.44151.33 ± 40.91All Grps19.95 ± 2.505.60 ± 1.4953.29 ± 55.794.33 ± 2.216.98 ± 0.31194.15 ± 72.38135.89 ± 50.66Appendix A5: Mean ± SD of physicochemical parameters across study months (continued)MonthsTH mg/LNa+ (mg/L)K+ (mg/L)Cl⁻ (mg/L)SO₄²⁻(mg/L)TSS (mg/L)Aug 202376.13 ± 16.5412.13 ± 6.653.71 ± 2.0722.70 ± 9.3823.38 ± 10.47144.88 ± 120.57Sept 202376.56 ± 26.1810.54 ± 4.443.63 ± 1.5721.66 ± 8.9839.13 ± 17.76103.94 ± 53.23Oct 202384.63 ± 11.896.58 ± 3.222.94 ± 1.3513.16 ± 9.9118.21 ± 11.77161.50 ± 133.27Nov 202376.13 ± 16.5410.59 ± 4.544.35 ± 1.5723.59 ± 8.9532.11 ± 10.44353.38 ± 308.84Dec 202376.56 ± 26.185.70 ± 3.652.64 ± 0.818.91 ± 5.0939.13 ± 17.76106.63 ± 48.00Jan 202472.38 ± 26.898.33 ± 2.803.36 ± 1.3619.63 ± 8.9116.93 ± 6.54208.56 ± 361.83Feb 202467.25 ± 18.7410.66 ± 4.576.37 ± 3.9322.05 ± 7.5328.56 ± 7.82275.19 ± 283.37Mar 202475.25 ± 6.858.80 ± 4.072.97 ± 1.4819.20 ± 9.4320.97 ± 9.45113.00 ± 49.76Apr 202469.00 ± 21.2710.71 ± 4.283.36 ± 1.3121.56 ± 8.3020.61 ± 8.49117.13 ± 46.12May 202470.31 ± 14.289.28 ± 3.534.10 ± 1.6622.13 ± 9.3319.61 ± 7.4495.56 ± 92.88June 202468.00 ± 13.5910.71 ± 3.863.58 ± 1.2924.51 ± 10.0434.25 ± 15.46112.19 ± 91.90Jul 202480.13 ± 10.6810.58 ± 3.703.74 ± 1.1522.96 ± 9.1620.79 ± 7.5099.31 ± 73.04All Grps74.36 ± 18.709.55 ± 4.473.73 ± 1.9720.17 ± 9.6626.14 ± 13.68157.60 ± 187.84Appendix A6: Mean ± SD of physicochemical parameters across sampling monthsMonthsNO₂⁻-N (mg/L)NH₄⁺-N (mg/L)SRP (mg/L)NO₃⁻-N (mg/L)TN (mg/L)TP (mg/L)Aug 20230.20 ± 0.290.09 ± 0.060.02 ± 0.012.60 ± 1.283.24 ± 0.910.09 ± 0.06Sept 20230.04 ± 0.030.31 ± 0.230.05 ± 0.072.31 ± 0.603.94 ± 0.780.06 ± 0.09Oct 20230.03 ± 0.030.14 ± 0.170.05 ± 0.071.75 ± 0.722.45 ± 1.030.06 ± 0.09Nov 20230.11 ± 0.120.16 ± 0.100.07 ± 0.053.10 ± 0.574.40 ± 0.860.34 ± 0.33Dec 20230.04 ± 0.060.13 ± 0.200.03 ± 0.011.37 ± 0.533.99 ± 0.810.04 ± 0.02Jan 20240.03 ± 0.020.23 ± 0.130.06 ± 0.032.45 ± 0.594.02 ± 0.700.22 ± 0.13Feb 20240.02 ± 0.020.18 ± 0.150.09 ± 0.052.80 ± 0.745.79 ± 2.850.47 ± 0.29Mar 20240.03 ± 0.040.19 ± 0.160.08 ± 0.052.12 ± 1.262.93 ± 1.140.23 ± 0.17Apr 20240.05 ± 0.060.11 ± 0.090.05 ± 0.042.33 ± 0.833.23 ± 0.840.15 ± 0.11May 20240.04 ± 0.030.29 ± 0.180.07 ± 0.073.16 ± 1.134.62 ± 0.810.34 ± 0.15June 20240.03 ± 0.030.13 ± 0.200.18 ± 0.143.60 ± 0.614.43 ± 0.760.26 ± 0.17Jul 20240.03 ± 0.030.14 ± 0.270.10 ± 0.092.56 ± 0.583.39 ± 0.840.65 ± 0.46All Grps0.06 ± 0.110.18 ± 0.180.07 ± 0.082.51 ± 1.003.87 ± 1.430.24 ± 0.27Rights and permissions
    Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
    Reprints and permissionsAbout this articleCite this articleSaturday, A., Herrnegger, M., Kangume, S. et al. Spatiotemporal variability of surface water quality in tropical agriculture-dominated catchments: insights from water quality indices.
    Sci Rep 15, 42971 (2025). https://doi.org/10.1038/s41598-025-27066-xDownload citationReceived: 19 July 2025Accepted: 31 October 2025Published: 02 December 2025Version of record: 02 December 2025DOI: https://doi.org/10.1038/s41598-025-27066-xShare this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy shareable link to clipboard
    Provided by the Springer Nature SharedIt content-sharing initiative
    KeywordsSurface water qualityWeighted arithmetic water quality indexComprehensive pollution indexPrincipal component analysisMaziba catchment, uganda More

  • in

    Drinking water is at risk during warfare — better protections are needed

    In war, drinking water is vulnerable to attacks. Fuel shortages have crippled Gaza’s water infrastructure, depriving 96% of households of safe water. The destruction of the Kakhovka Dam in Ukraine in June 2023 cut off the supply for 700,000 people.
    Competing Interests
    The authors declare no competing interests. More

  • in

    The response of meteorological drought to extreme climate in the water-receiving area of the Tao river diversion project in China

    AbstractAgainst the backdrop of increasing global extreme climate events, the water-receiving area of the Tao River Diversion Project in China frequently experiences drought disasters. In view of this, this study uses the meteorological drought index (SPEI) and 12 extreme climate indicators to analyze drought and extreme climate changes and their effects on drought. The results show that SPEI12 presents a slow but persistent downward trend, with 1988 as the turning point. Drought frequency exhibited a spatial pattern of being lower in the northwest and higher in the southeast. Additionally, droughts occurred more frequently but were of shorter duration. The summer days (SU) and warm days and warm nights (TX 90p, TN 90p) increased significantly. Although extreme precipitation events increased, the magnitude of change was not significant and thus could not effectively counteract the drought trend. Annual total precipitation (PRCPTOT), cold days (TX10p) and summer days (SU) are the main driving factors of meteorological drought, and their contribution rates are 23.35, 18.93 and 14.75% respectively. Annual total precipitation (PRCPTOT) and summer days (SU) have linear relationship with meteorological drought, and extreme climate indexes such as warm spell duration index (WSDI) and warm nights (TN90p) have nonlinear changes. Especially, the consecutive wet days (CWD) has a significant nonlinear change, when less than 8.58 days will lead to wetting, more than 8.58 days will lead to drought. In response to pressures from a warmer and drier future climate, the agricultural sector can build adaptive capacity. This can be achieved through the adjustment of planting structures, the promotion of water-saving technologies, and the incorporation of machine learning-derived thresholds into drought monitoring systems to enhance their early warning capabilities.

    Similar content being viewed by others

    Comparative study on bivariate statistical characteristics of drought in Shandong using SPI and SPEI

    Article
    Open access
    02 April 2025

    Analysis of the spatial and temporal evolution of drought in Henan based on a nonlinear composite drought index

    Article
    Open access
    26 November 2024

    Spatial–temporal variation of extreme precipitation in the Yellow–Huai–Hai–Yangtze Basin of China

    Article
    Open access
    08 June 2023

    IntroductionAgainst the backdrop of accelerating global climate change, the frequency and intensity of extreme climate events have increased markedly worldwide. Among these, droughts in particular have been occurring more often and with greater impact each year. As a widespread natural hazard exacerbated by global climate change, drought significantly threatens ecosystem stability and sustainable agricultural development1. China stands as one of the regions most vulnerable and sensitive to these changes2, This is especially evident in the Loess Plateau and surrounding areas, where persistent drought has long constrained regional development. The Tao River Diversion Project, the largest inter-basin water transfer initiative in Gansu Province, has alleviated water scarcity issues in the receiving regions. However, against a background of rapid urbanization, increased rainfall variability, and more frequent extreme heat events, drought disasters are occurring more often and causing worsening economic losses. Traditional disaster management approaches are increasingly inadequate in addressing the compound challenges arising from the interplay of climate crises and human activities. Extreme climate events refer to those that fall in the tail of the statistical distribution of climate variables and have very low probabilities of occurrence. As a major category of such extremes, meteorological drought specifically describes a severe imbalance in the water budget caused by prolonged abnormal precipitation deficits coupled with anomalously high potential evapotranspiration.Defined by a sustained imbalance where precipitation is insufficient to meet evapotranspiration demands, meteorological drought is a precursor to other forms of drought and plays a vital role in drought monitoring3. Numerous quantitative indices have been established in drought research, such as the Standardized Precipitation Index (SPI)4, Standardized Precipitation Evapotranspiration Index (SPEI)5, Standardized Runoff Index (SRI)6, Self-Calibrated Palmer Drought Severity Index (scPDSI)7 and a variety of other quantitative indicators. Different drought indicators represent different types of drought. These indices correspond to different drought categories; SPI and SPEI typically characterize meteorological drought, and SRI corresponds to hydrological drought. An ongoing controversy, however, surrounds whether SPEI and scPDSI should be classified as indicators of meteorological or agricultural drought. Comparative studies have indicated that on a global scale, scPDSI demonstrates a superior capacity over SPEI for characterizing agricultural drought8.In recent years, considerable progress has been made in understanding the relationship between drought dynamics and extreme climate events. However, most studies have focused primarily on the impacts of extreme climate on agricultural and hydrological droughts, while in-depth analysis of the nonlinear interactions between meteorological drought and extreme climate remains limited. For instance: Jehanzaib et al. applied the non-stationary GAMLSS algorithm to reveal how climate change and human activities influence hydrological drought across different time scales9; Shah et al. used multiple regression and ridge regression to analyze the effects of surface temperature and precipitation on wheat and rice yields10; B. et al. adopted the comprehensive ranking OME method to prioritize extreme climate indices affecting agriculture in the U.S. Corn Belt11; Jamalzi et al. assessed drought conditions in Afghanistan using the SPI and SPEI indices and performed correlation analyses with climatic variables such as maximum temperature (Tx) and minimum temperature (Tn) using Pearson correlation and linear regression12; YIN & BAI quantitatively evaluated the contributions of extreme climate variables to multi-scale meteorological drought across different climatic zones in the Yellow River Basin, China. Their results indicated that the Rx5day index is the dominant factor contributing to meteorological drought at various time scales, accounting for approximately 65% of the influence13; QU et al. employed wavelet spectrum analysis to quantify extreme climate conditions and their impact on hydrological drought in the Xilin River Basin of Inner Mongolia, China. They found a warming and drying trend in the region, where a significant increase in extreme temperature events, coupled with a lack of synchronous increase in extreme precipitation, has intensified hydrological drought risks14; Sifang et al. investigated changes in compound hydrological drought and extreme high temperature (CHDHE) events in China’s semi-arid Luanhe River Basin from 1961 to 201615. Although various methods are available to quantify the relationship between drought and extreme climate, their association is often not simply linear. Moreover, while existing research has largely centered on the effects of extreme climate or climate change on hydrological and agricultural droughts, further analysis is needed to elucidate the mechanisms through which extreme climate influences meteorological drought.This study is designed to bridge this knowledge gap by pursuing three key objectives: first, to create a multidimensional portrait of the dynamics of extreme climate and meteorological drought by combining a suite of relevant indices; second, to decipher their interrelationships and lag effects using wavelet coherence and cross-wavelet analysis; and finally, to pinpoint the dominant drivers and unravel the nonlinear influence of extreme climate on drought through an integrated XGBoost-SHAP framework. Our goal is to clarify the underlying response mechanisms, providing actionable insights for this and other vulnerable regions worldwide.Materials and methodsOverview of the study areaThe Tao River is the second-largest tributary of the upper reaches of the Yellow River in China, with a total length of 673 km. The water-receiving area of the Tao River Diversion Project is located in the central part of Gansu Province, China, lying in the transition zone from the hilly and gully area of the Loess Plateau to the Longxi Loess Plateau, with an altitude ranging from 1137 to 3858 m. Dominated by landforms such as loess ridges, loess hills, and gullies, this region features diverse terrain with significant undulations. It suffers from severe soil erosion and high ecological environment sensitivity, making it one of the key areas for soil and water conservation on the Loess Plateau. In terms of topography, it presents a spatial pattern of being higher in the northwest and lower in the southeast. The overview map of the study area is shown in Fig. 1.Fig. 1Overview of the Study Area .Note: The figure made in ARCGIS10.8 (https://desktop.arcgis.com/zh-cn/index.html), the using of natural resources reproduction standard mapping (http://bzdt.ch.mnr.gov.cn/). The approval number is GS(2024) 0650, and the boundary of the base map is not modified.Full size imageData sources and processingThe Standardized Precipitation Evapotranspiration Index (SPEI) is sourced from the CHM_Drought dataset (https://zenodo.org/records/14634774). It is a raster dataset with a spatial resolution of 0.1° × 0.1°. This dataset is robust and capable of accurately capturing drought events across mainland China16. The extreme climate data are sourced from the National Cryosphere Desert Data Center of China (http://www.ncdc.ac.cn). It is an NC dataset with a spatial resolution of 0.1° × 0.1°. The annual drought and extreme climate data from 1985 to 2018 were selected, and subsequently, ArcMap 10.8 was used to perform clipping and masking processing on the downloaded data.Extreme climate indices are based on the 27 extreme climate indices defined by the Expert Team on Climate Change Detection and Indices (ETCCDI). In this study, 12 of these commonly used extreme climate indices were selected for analysis17 (Table 1).Table 1 Definitions of extreme climate indicators.Full size tableResearch methodsThis study develops a comprehensive framework to analyze how meteorological drought responds to extreme climate conditions in the receiving area of China’s Tao River Diversion Project. The framework comprises three integrated modules, as illustrated in Fig. 2: trend and mutation analysis, drought characterization, and driving factor analysis.

    (a)

    Trend and Mutation Point Analysis: Interannual trends (1985–2018) in meteorological drought and twelve extreme climate indices were evaluated using a combination of linear regression, Sen’s slope test, and the Mann–Kendall test. The Innovative Trend Analysis (ITA) method provided graphical validation of these trends. Key mutation points in the time series were subsequently identified using the Sequential Mann–Kendall test.

    (b)

    Drought Characterization and Cycle Identification: Based on run theory, key characteristics—average duration, severity, and intensity—of drought events were extracted from the SPEI12 series. The spatial frequency of different drought grades was also mapped. Wavelet coherence and cross-wavelet analyses were then applied to identify resonance periods, phase relationships, and lag effects between SPEI12 and extreme climate indices across multiple time scales.

    (c)

    Driving Factor Analysis: A two-stage analytical approach was employed to identify key drivers. First, Pearson correlation analysis was used to prescreen extreme climate indices significantly associated with SPEI. Next, an optimized XGBoost model, tuned via Bayesian optimization, was constructed to predict SPEI12. The SHAP framework quantified the contribution of each influencing factor, while dependence plots illustrated the nonlinear effects and critical thresholds of the principal drivers.

    Fig. 2Flow chart of research methods.Full size imageStandardized precipitation evapotranspiration indexThe Standardized Precipitation Evapotranspiration Index (SPEI) is an indicator developed based on the Standardized Precipitation Index (SPI)5, The Standardized Precipitation Evapotranspiration Index (SPEI) comprehensively considers meteorological factors such as temperature, precipitation, and evapotranspiration. It can stably assess the spatiotemporal characteristics of drought across multiple spatiotemporal scales and has been widely applied at present18,19. According to China’s national standard GB/T 20,481–2017 Meteorological Drought Grades, the SPEI drought grades are divided into 5 levels (Table 2).Table 2 Evaluation grades of drought severity.Full size tableTrend and mutation analysisThe Mann–Kendall non-parametric test (abbreviated as M–K test)20,21, Sen’s slope test22, and ITA (Interactive Trend Analysis) method were employed to analyze the interannual characteristic changes and trendality of the Standardized Precipitation Evapotranspiration Index (SPEI) and 12 extreme climate indices in the water-receiving area of the Tao River Diversion Project. Among these methods, the ITA test can provide complementary visualized result s of trends. To validate the Mann–Kendall (M–K) trend test, a linear trend test was used for cross-validation23,24. (see Table 3 for the results of spatial trend classification).Table 3 Statistics of spatial trend changes.Full size tableFor the M–K trend test, at a given significance level α, if |z|> zα/2, it indicates that the hypothesis of no trend is rejected, and the time series data exhibit an obvious trend change. When |z| is greater than 1.645, 1.96 and 2.58, the trend is considered to have passed the significance test, with confidence levels of 90, 95 and 99%, respectively25.The Interactive Trend Analysis (ITA) procedure begins by dividing a long-term time series into two consecutive sub-intervals of equal length, each sorted in ascending order. The data from the first sub-interval are plotted on the x-axis of a Cartesian coordinate system, and the data from the second on the y-axis. A diagonal line (45°) serves as a reference, bisecting the coordinate plane into upper and lower triangular areas. The trend of the time series is interpreted from the scatter plot as follows: data points distributed along the diagonal indicate stationarity (no trend); points accumulating in the upper triangle denote an increasing trend; and points in the lower triangle signify a decreasing trend. Non-monotonic trends can be precisely characterized by categorizing the data into low, medium, and high-value groups for separate analysis. Comprehensive computational details are available in26.Currently, there are several methods for identifying mutation points, including the M–K test27, Pettit test28,29, and heuristic segmentation method30. However, uncertainties may exist in the identification of mutation points for drought and extreme climate. Compared with the traditional M–K test, the Sequential Mann–Kendall test is more suitable for mutation analysis, and its core advantage lies in its ability to identify multiple mutation points in a time series. The Sequential Mann–Kendall test is a sequential version of the Mann–Kendall test31,32 and enables more distinct identification of mutation points based on the significance value (α). Specifically, it first locates points where variables shift abruptly, then finds the intersection of the forward and backward sequences with the highest significance values. If this value falls within the confidence interval, the mutation point is determined based on the breakpoints with high significance values33.Run theoryRun theory is a time series analysis method that has been widely used to extract drought characteristics34,35.In this study, drought identification was conducted based on the annual-scale Standardized Precipitation Evapotranspiration Index (SPEI). According to the drought grade classification, a drought event occurs only when SPEI ≤ -0.5, with a minimum drought duration of 1 year. The number of drought events, average duration (D), severity (S), and intensity (L) of droughts in the water-receiving area of China’s Tao River Diversion Project were calculated. The average values can comprehensively reflect the long-term drought characteristics of the region, and the specific calculation formulas are as follows:$$D = T_{end} – T_{start} + 1$$
    (1)
    $$S = sumlimits_{{t = T_{start} }}^{{T_{end} }} {left| {SPEI} right|}$$
    (2)
    $$L = frac{S}{D}$$
    (3)
    Drought occurrence frequencyThe drought occurrence frequency (P) refers to the degree of drought occurrence within a specific time period.$$P = frac{n}{N} times 100%$$
    (4)
    where: n is the number of years with a specific grade of drought, and N is the total number of years in the study period. In this study, the total frequency of mild, moderate, severe, extreme droughts, as well as the overall drought frequency, were analyzed respectively based on the pixel-wise SPEI12.Wavelet coherence and cross-waveletWavelet Coherence (WTC) describes the consistency of changes between two sequences across different time scales36,37. To enhance the analytical ability in high-energy regions, the Cross Wavelet Transform (XWT) is simultaneously used to analyze the correlation characteristics between the Standardized Precipitation Evapotranspiration Index (SPEI) and extreme climate indices in high-energy regions.In this analysis, colors denote the relative energy density, while the thin black solid line defines the cone of influence (COI), within which the spectral values are considered valid. The black contours inside the COI delineate regions exceeding the 95% confidence level (α = 0.05). The phase relationship between the SPEI and extreme climate indices is indicated by the arrow direction: rightward for a positive phase and leftward for a negative phase. Concurrently, the arrow orientation reveals the lead-lag dynamics: an upward orientation denotes that the SPEI lags the extreme index, and a downward orientation denotes that the SPEI leads it.Driving factor analysis based on XGBoost and SHAPPearson correlation analysis was used to examine the relationship between meteorological drought and extreme climate indices. To further clarify the response of meteorological drought to extreme climates, machine learning (among 6 machine learning models, XGBoost performed the best) combined with SHAP analysis was adopted to determine the contribution degree of different extreme climate indices to meteorological drought. The XGBoost algorithm is an efficient ensemble learning method, commonly used for predicting high-dimensional data. By gradually optimizing the objective function and constructing weighted decision trees, it can capture nonlinear relationships38.The Shapley Value method can be used to evaluate the contribution of various extreme climates to the prediction results of meteorological drought. By calculating the marginal contribution of features, SHAP (SHapley Additive exPlanations) can reveal the impact of each factor on meteorological drought39. The XGBoost-SHAP method provides deeper insights into climate driving mechanisms and uncovers complex interdependencies that may be overlooked by traditional analyses10.Results and analysisSpatiotemporal variation analysis of meteorological droughtThis study analyzed the 12-month Standardized Precipitation Evapotranspiration Index (SPEI12) in the water-receiving area of the Tao River Diversion Project from 1985 to 2018. As shown in Fig. 3a, the region experienced considerable fluctuations in meteorological drought, with the SPEI12 exhibiting a non-significant downward trend at a rate of − 0.04 per decade (Zₘ₋ₖ =  − 0.55, βₛₑₙ’ₛ =  − 0.006). A distinct wet-to-dry transition was also observed between 1992 and 1997.Fig. 3Analysis of Meteorological Drought Trend Mutation from 1985 to 2018.Full size imageTo complement these findings, trend variations across different value ranges were visualized using the Interactive Trend Analysis (ITA) test (Fig. 3b). The results indicated relatively stable SPEI12 values in the high-value zone, a decrease at 62% of the points in the medium-value zone, and non-significant fluctuations in the low-value zone. These outcomes were consistent across the linear trend, Mann–Kendall (M–K), Sen’s slope, and ITA tests.The Sequential Mann–Kendall test was further applied to identify abrupt drought shifts. Figure 3c reveals statistically significant mutation points in 1988 and 1990, with the highest significance (α = 0.651) occurring in 1988. Thus, a meteorological drought mutation was confirmed in 1988. Additionally, the UF statistic remained below zero after 1992, indicating a progressive intensification of aridity in the study area from that year onward.Based on the Sen’s slope estimator and Mann–Kendall (M–K) trend test, this study examined the spatial variation characteristics of meteorological drought in the water-receiving area of China’s Tao River from 1985 to 2018 (Fig. 4).Fig. 4Spatial Variation Trend and Distribution of Meteorological Drought. Note: The figure made in ARCGIS10.8 (https://desktop.arcgis.com/zh-cn/index.html), the using of natural resources reproduction standard mapping (http://bzdt.ch.mnr.gov.cn/). The approval number is GS(2024) 0650, and the boundary of the base map is not modified.Full size imageAs illustrated in Fig. 4a, the region overall exhibited a persistent drying trend during the study period. Specifically, the northwest showed a slight or non-significant decrease in SPEI values, indicating a tendency toward aridification over the long term. In contrast, most of the southeast experienced a non-significant increase in SPEI, suggesting a gradual shift toward humidification.Analysis of the spatial distribution of the mean annual SPEI (Fig. 4b) further revealed relatively high values (indicating wetter conditions) in the northwest and parts of the east—likely influenced by monsoon patterns—while the southern and central parts showed relatively low SPEI values, reflecting drier conditions.Spatial distribution of drought event characteristicsBased on the run theory, this study conducted a pixel-wise drought analysis using SPEI12 data, with drought events defined as periods where SPEI ≤ –0.5 lasting at least one year.The results (Fig. 5) reveal that the highest frequency of drought events occurs in the southeastern part of the water-receiving area of the Tao River Diversion Project, with up to 11 recorded episodes. In contrast, the western areas experienced the fewest droughts, with only 6 events, while the central and northern regions showed moderate occurrence. In terms of average duration, droughts tended to persist longer—up to 3 years—in the southeast and parts of the central region, whereas shorter durations were observed elsewhere. A similar spatial pattern was found for average severity, with higher values (up to 3.14) in the southern and western regions and lower values in the remaining areas.Fig. 5Distribution of Drought Event Characteristics. Note: The figure made in ARCGIS10.8 (https://desktop.arcgis.com/zh-cn/index.html), the using of natural resources reproduction standard mapping (http://bzdt.ch.mnr.gov.cn/). The approval number is GS(2024) 0650, and the boundary of the base map is not modified.Full size imageNotably, the spatial distribution of average drought intensity exhibited an inverse pattern to that of duration: high values were concentrated in the northwest and low values in the southeast. Overall, although the southern region experienced fewer drought events, those that did occur were relatively prolonged. Conversely, in the drought-prone east, the duration of individual events was generally shorter.Figure 6 displays the spatial distribution of drought frequency at the annual scale. Mild droughts occur primarily in the southeastern and central parts of the region, while their frequency is notably lower in the northwest. In contrast, the pattern for moderate drought is essentially the inverse, with a higher frequency in the northwest and a lower frequency in the southeast. Severe drought occurrences are relatively uniform across the entire study area. Extreme droughts, however, are concentrated mainly in the central and western zones and are absent elsewhere. In summary, while the southeastern region experiences more frequent droughts overall, the northwestern area has a higher propensity for moderate drought.Fig. 6Spatial Distribution of Drought Frequency. Note: The figure made in ARCGIS10.8 (https://desktop.arcgis.com/zh-cn/index.html), the using of natural resources reproduction standard mapping (http://bzdt.ch.mnr.gov.cn/). The approval number is GS(2024) 0650, and the boundary of the base map is not modified.Full size imageSpatiotemporal variation analysis of extreme climate indicesAnalysis of extreme climate indices in the water-receiving area of the Tao River Diversion Project reveals distinct warming trends and subtle shifts in precipitation patterns (Fig. 7 and Table 4). For temperature-related indices, a marked increase was observed in summer days (SU) at a rate of 4.4 d/10a, suggesting more frequent intense warm events. Concurrently, cold nights (TN10p) decreased significantly (− 1.5%/10a), while cold days (TX10p) declined insignificantly (− 1.2%/10a). In contrast, both warm nights (TN90p) and warm days (TX90p) increased significantly, by 3.9%/10a and 2.3%/10a, respectively. The warm spell duration index (WSDI) also rose significantly (1 d/10a), whereas the cold spell duration index (CSDI) showed an insignificant decrease (− 0.2 d/10a). Collectively, these results indicate that the magnitude of increase in extreme warm indices substantially exceeds the decrease in extreme cold indices.Fig. 7Linear trend test of extreme climate.Full size imageTable 4 Statistical table of extreme climate trend tests.Full size tableAmong precipitation indices, only the Simple Daily Intensity Index (SDII) increased significantly, by 0.2 (mm/day)/10a. Other indices, including theDays with Precipitation ≥ 20mm (R20mm; 0.4 d/10a), total precipitation on very wet day (R95pTOT; 2.2%/10a), and the annual total precipitation (PRCPTOT; 23.9 mm/10a), exhibited insignificant upward trends. Consecutive wet days (CWD) decreased insignificantly (− 0.3 d/10a). Although most precipitation trends are not statistically significant, they collectively suggest a potential tendency toward future humidification in the region.The variation trends identified were consistent across the linear trend, Mann–Kendall, and Sen’s slope tests.Between 1985 and 2018, the water-receiving area of the Tao River Diversion Project exhibited considerable spatial heterogeneity in extreme climate indices (Fig. 8).Fig. 8Spatial Distribution of Extreme Climate Indices. Note: The figure made in ARCGIS10.8 (https://desktop.arcgis.com/zh-cn/index.html), the using of natural resources reproduction standard mapping (http://bzdt.ch.mnr.gov.cn/). The approval number is GS(2024) 0650, and the boundary of the base map is not modified.Full size imageThe summer days (SU) ranged from 6.65 to 88.74 days—a 13.4-fold difference—with higher values in the periphery and lower values in the central area. The spatial distributions of cold nights (TN10p) and cold days (TX10p) were similar, with higher frequencies in the northwest and central parts and lower frequencies in the southeast. Their maximum values reached 10.22% and 10.24%, and minimum values were 8.78% and 9.38%, respectively.In contrast, warm nights (TN90p) and warm days (TX90p) both reached their maxima in the southeast, with minima located in the northwest and central regions, respectively. The spatial pattern of the warm spell duration index (WSDI) resembled that of warm days. The cold spell duration index (CSDI) showed marked spatial variation, with a maximum value of 7.59 days concentrated in the south, while most other areas recorded a minimum of 1 day.Among precipitation-related indices, the simple daily intensity index (SDII) and Days with Precipitation ≥ 20mm(R20mm) shared a similar spatial structure—low in the north and south, high in the east and west. Both consecutive wet days (CWD) and annual total precipitation (PRCPTOT) decreased from south to north. In contrast, the total precipitation on very wet days (R95pTOT) was lower in the central area and higher in the surrounding regions.For extreme climate indices with multiple mutation points identified, the most statistically significant point from the Sequential Mann–Kendall test was selected as the definitive mutation year. As summarized in Table 5, a significant mutation in meteorological drought occurred in 1988.Table 5 Mutation point test of extreme climate indices.Full size tableThe mutation points for extreme temperature indices—including warm nights (TN90p), warm days (TX90p), summer days(SU), cold nights(TN10p), cold days(TX10p)—were concentrated between 1990 and 2000, showing a slight lag relative to the 1988 drought mutation. Similarly, mutation points for extreme precipitation indices (SDII, R20mm, R95pTOT) generally occurred later, with most emerging after 2000. Specifically, the simple daily intensity index (SDII) and the total precipitation on very wet days (R95pTOT) mutated between 2007 and 2008, whereas the days with precipitation ≥ 20mm (R20mm) and annual total precipitation (PRCPTOT) shifted as late as 2015–2016.This sequential pattern suggests that the mutation in meteorological drought is an integrated response to the combined effects of temperature and moisture, reacting more directly to global warming. In contrast, extreme precipitation depends not only on sufficient water vapor but also on specific atmospheric circulation patterns, making its triggering mechanisms more complex. This added complexity likely accounts for the observed lag in its mutation points.Wavelet analysis of meteorological drought and extreme climateA correlation analysis was conducted between the annual Standardized Precipitation Evapotranspiration Index (SPEI) and several extreme climate indices (Fig. 9), with the key findings summarized as follows.Fig. 9Wavelet spectra of meteorological drought and extreme climate.Full size imageSignificant periodic resonances and phase relationships were identified between SPEI and specific extreme indices. For summer days (SU), cross wavelet transform (XWT) revealed a significant 3–4 year resonance period with a negative phase during 1990–1996, while wavelet coherence (WTC) showed strong correlations around the 4-year scale from 1990 to 2001 and a 4–5 year period during 2009–2012, both indicating SPEI lagging behind SU under a negative correlation. In the case of cold nights (TN10p), although XWT showed no significant resonance, WTC identified a relatively significant 0–3 year period with SPEI leading TN10p under weak negative correlation. For cold days (TX10p), WTC indicated high-energy resonances at 2–4 years (1990–1998) and 5–7 years (2008–2011), showing a positive correlation with SPEI leading. Regarding warm nights (TN90p), WTC detected significant resonances at 0–2 years (1990–1995) and 4–5 years (2009–2012), with upward-left arrows suggesting negative correlation and SPEI lagging. Finally, for warm days (TX90p), XWT indicated a sub-annual resonance during 2003–2006, while WTC revealed a significant 8–9 year period from 1996 to 2007, also reflecting negative correlation and SPEI lagging behind TX90p.Wavelet analysis of SPEI and the warm spell duration index (WSDI) revealed a significant 0–1 year resonance period during 1998–2001 in the XWT results. The WTC analysis further identified three significant resonance periods: two 0–1 year bands during 1987–1991 and 1997–1999, and an 8–10 year period during 1998–2007. All of these resonance periods exhibited a significant negative correlation, with SPEI lagging behind WSDI. For the cold spell duration index (CSDI), the XWT analysis showed no significant resonance period, although intermittent energy concentration was observed at the 0–7 year scale. In contrast, the WTC analysis detected a significant 0–1 year resonance period during 2001–2003, which also showed a significant negative correlation with SPEI lagging behind CSDI.For precipitation-related indices, SPEI and both the simple daily intensity index (SDII) and the Days with Precipitation ≥ 20mm (R20mm) exhibited similar correlation patterns. The XWT results showed significant approximately 3-year resonance periods during 1993–1997 for SPEI-SDII and during 1992–1996 for SPEI-R20mm. The WTC analysis identified three common significant resonance periods: 0–8 years during 1998–2013, 0–7 years during 1998–2003, and 5–10 years during 1996–2015. All showed significant positive correlations with SPEI leading both SDII and R20mm. For consecutive wet days (CWD), the XWT revealed a significant approximately 6-year resonance period during 2007–2011, while the WTC analysis showed significant 0–1 year resonance periods during 1989–1990 and 2001–2003, as well as a significant 3–6 year period during 1990–2012. These correlations were significantly positive with SPEI lagging behind CWD. Regarding the total precipitation on very wet days (R95PTOT), the XWT detected a significant 2–3 year resonance period during 1992–1997, and the WTC analysis identified a significant 0–8 year period during 1998–2012, showing a significant positive correlation with SPEI leading. For annual total precipitation (PRCPTOT), the XWT indicated a significant less-than-1-year resonance period during 2002–2003, and the WTC analysis revealed a significant 0–6 year period during 1998–2012, exhibiting a significant positive correlation with SPEI leading.Overall, the Standardized Precipitation Evapotranspiration Index (SPEI) exhibits a negative correlation with extreme temperature indices and a positive correlation with extreme precipitation indices. Specifically, SPEI lags behind several temperature-related indices— summer days (SU), warm nights (TN90p), warm days (TX90p), and warm spell duration index (WSDI)—while it leads several precipitation-related indices: cold nights (TN10p), cold days (TX10p), simple daily intensity index (SDII), days with precipitation ≥ 20mm (R20mm), total precipitation on very wet days (R95PTOT), and annual total Precipitation (PRCPTOT).In terms of temporal dynamics, SPEI and many extreme temperature indices show significant resonance on short-time scales (under 5 years), reflecting the rapid influence of changing climatic conditions on drought. In contrast, certain extreme precipitation indices—SDII, R20mm, and consecutive wet days (CWD)—display significant resonance with SPEI on medium- to long-time scales (over 5 years), highlighting the long-term regulatory role of precipitation patterns in drought evolution.Analysis of drought driving factorsAnalysis of driving factors based on Pearson correlationPearson correlation analysis (Fig. 10) indicates that from 1985 to 2018, meteorological drought, as represented by the Standardized Precipitation Evapotranspiration Index (SPEI), exhibited distinct relationships with various extreme climate indices. SPEI was positively correlated with annual total precipitation (PRCPTOT), days with precipitation ≥ 20mm (R20mm), simple daily intensity index (SDII), and cold days (TX10p). Conversely, it was negatively correlated with the summer days (SU), warm days (TX90p), and the warm spell duration index (WSDI).Fig. 10Pearson correlation Heatmap.Full size imageAmong these correlations, SPEI showed a highly significant positive correlation with PRCPTOT and a highly significant negative correlation with SU (both at *p* ≤ 0.01). Meteorological drought was most strongly influenced by PRCPTOT and SU, while it was least affected by total precipitation on very wet days (R95pTOT) and thecold nights (TN10p). These results suggest that increased annual precipitation corresponds to higher SPEI values—indicating a trend toward wetter conditions—whereas more summer days are associated with lower SPEI, reflecting intensified drought.Analysis of driving factors based on machine learningSeveral nonlinear fitting algorithms—including Decision Tree Regressor, Random Forest Regressor, Support Vector Regressor (with RBF kernel), XGBoost Regressor, Network Regressor, and Gaussian Process Regressor—were employed for model fitting. The coefficient of determination (R2) was used to quantify the proportion of variance explained by the model40, hough it does not convey prediction error. In contrast, the root mean square error (RMSE) reflects the accuracy of model predictions41.Results (Fig. 11) indicate that XGBoost achieved the best performance, with an R2 of 0.8335 and an RMSE of 0.265. Based on this, an SPEI12 prediction model was developed using XGBoost. Bayesian optimization was applied to determine the optimal hyperparameters: a maximum tree depth of 3, a learning rate of 0.096, L1 regularization of 0.0187, and L2 regularization of 0.0124. The dataset was split into 80% for training and 20% for testing. Through incremental learning and early stopping, the optimal number of iterations was identified as 62.Fig. 11Model Evaluation.Full size imageThe model exhibited excellent performance on both the training set (R2 = 0.94, RMSE = 0.21) and the test set (R2 = 0.91, RMSE = 0.25), indicating strong predictive capability, high generalization capacity, and no signs of overfitting. Overall, the model demonstrates high effectiveness and fulfills the required prediction standards.To further investigate the driving factors of meteorological drought, SHAP (SHapley Additive exPlanations) analysis was employed. As illustrated in Fig. 12a, annual total precipitation (PRCPTOT) is the primary driver of changes in meteorological drought (SPEI12), with the highest mean absolute SHAP value of 0.245, confirming the dominant role of precipitation. The cold days (TX10p) and the summer days (SU) rank second and third, with mean absolute SHAP values of 0.198 and 0.154, respectively. While Pearson correlation analysis identified PRCPTOT, SU, and the days with precipitation ≥ 20mm (R20mm) as the top three influential factors, SHAP analysis reveals that TX10p and SU may significantly influence drought through nonlinear mechanisms.Fig. 12SHAP importance and summary plot.Full size imageThe SHAP contributions of the warm days (TX90p), consecutive wet days (CWD), and warm spell duration index (WSDI) are 0.093, 0.08, and 0.069, respectively—significantly lower than those of PRCPTOT, TX10p, and SU. By integrating Pearson correlation analysis with SHAP interpretation, this study not only identifies the key drivers of meteorological drought and confirms the importance of annual precipitation, but also uncovers nonlinear relationships that cannot be captured by correlation analysis alone.To quantify the relative contributions of different variables, the absolute Shapley values were averaged and normalized to sum to 100%. Among the 12 extreme climate indices, PRCPTOT, TX10p, and SU emerge as the top three drivers, contributing 23.35%, 18.93%, and 14.75%, respectively, to the model output. Other variables such as TX90p, CWD, WSDI, and SDII have relatively minor impacts, with contributions of 8.92%, 7.66%, 6.60%, and 5.80%, respectively. The remaining indices—TN90p, TN10p, R20mm, CSDI, and R95pTOT—each contribute less than 5%.Figure 12b further illustrates the direction of influence of each variable. When an increase in a feature value corresponds to a positive SHAP value, it indicates that the variable contributes to higher SPEI values (reduced drought). PRCPTOT, TX10p, CWD, SDII, TN10p and TN90p exhibit such positive effects, suggesting they alleviate aridification. Conversely, SU, TX90p, WSDI show negative relationships, indicating they intensify drought conditions. Meanwhile, R20mm, CSDI, and R95pTOT are distributed near zero SHAP values, reflecting their minimal contribution to SPEI12 predictions.To better elucidate the influence of dominant factors on SPEI12, this study employed SHAP (SHapley Additive exPlanations) visualization to rank all factors by importance, thereby intuitively revealing the impact of different extreme climate indices on SPEI12 predictions.As shown in Fig. 13, PRCPTOT (annual total precipitation) exhibits a clear linear relationship with SHAP values: as precipitation increases, SHAP values consistently rise, indicating a direct humidifying effect on drought conditions. Similarly, TX10p (cold days) also contributes to humidification, though in a nonlinear manner, with the rate of moisture increase slowing as TX10p rises. In contrast, SU (summer days) shows a straightforward negative influence, where more summer days directly intensify aridification.Fig. 13Diagram of the dependence relationship between extreme factors and meteorological drought.Full size imageTX90p (warm days) and CSDI (cold spell duration index) display threshold-based behaviors. For TX90p, values below 21.23% enhance aridification, with the effect diminishing as the threshold is approached. CSDI exhibits a dual pattern: below 11.69%, it promotes aridification, while above this value, it favors humidification.CWD (consecutive wet days) demonstrates a notable nonlinear relationship. Below 8.58 days, it contributes to humidification at a decelerating rate; beyond this threshold, however, it begins to promote aridification. This may be because short wet spells replenish soil moisture, whereas prolonged precipitation increases runoff or evaporation, reducing water use efficiency.WSDI (warm spell duration index), TN90p (warm nights), TN10p (cold nights), and R95pTOT (total precipitation on very wet days) all follow similar threshold-dependent patterns, with respective thresholds of 2.15 days, 14.79%, 11.74%, and 18.95%. Notably, R95pTOT has only a minor influence on SHAP values. Meanwhile, both SDII (simple daily intensity index) and R20mm (days with precipitation ≥ 20mm) promote humidification at an accelerating rate as they increase.Overall, the analysis reveals that most extreme climate indices exhibit complex nonlinear relationships with meteorological drought, often involving specific risk thresholds (as seen with CWD, TX90p, and WSDI). In contrast, SU and PRCPTOT maintain simple linear relationships with SPEI12.DiscussionAridification and extreme climate change trendsThe persistent aridification trend observed in the water-receiving area of China’s Tao River Diversion Project aligns closely with the documented warm-dry pattern across Northwest China42,43. This trend is further corroborated by pronounced changes in extreme climate indices: a marked increase in warm extremes (SU, TX90p, TN90p), a decline in cold extremes (TX10p, TN10p), and an overall rise in extreme precipitation indicators (R20mm, R95pTOT, PRCPTOT). These shifts collectively reflect the sharp increases in both temperature and precipitation that Northwest China has experienced over the past half-century44. The complexity of precipitation variability in the region may be attributed to Gansu Province’s location in central Eurasia, where interactions between the Western Pacific Monsoon and westerly flows from the Mediterranean, Black Sea, and North Atlantic shape local rainfall patterns, resulting in a highly variable and intricate extreme precipitation regime45.Explanation of lag responseWavelet analysis reveals distinct correlation and lag effects between meteorological drought and extreme climate indices. The observed negative correlation and lagged response of drought to extreme high temperatures may be attributed to accumulated heat stress leading to sustained moisture loss, while increased temperatures enhance evapotranspiration from the land surface, thereby delaying drought recovery. This interpretation is consistent with existing studies reporting a gradual rise in evapotranspiration across Northwest China46. Conversely, meteorological drought exhibits a leading response and positive correlation with extreme precipitation indices (e.g., SDII and R20mm), suggesting that increased extreme precipitation provides short-term drought alleviation. The short-term resonance period between extreme high temperature and drought contrasts with the longer resonance period observed for extreme precipitation. This indicates that while temperature directly drives drought onset, precipitation plays a moderating role that operates over longer time scales. These findings collectively underscore that temperature remains the dominant factor influencing drought dynamics in the arid inland regions of Northwest China47.Discovery of nonlinear threshold effectThe XGBoost-SHAP model effectively quantified the contribution rates and nonlinear thresholds of extreme climate factors influencing drought in the water-receiving area of the Tao River Diversion Project. Among these factors, annual total precipitation (PRCPTOT) is the most important variable alleviating drought, with a contribution rate of 23.35%, followed by the cold days (TX10p) at 18.93%. A notable threshold effect was observed for consecutive wet days (CWD). When CWD exceeds 8.58 days, it may paradoxically intensify drought, likely because prolonged rainfall increases surface runoff and reduces water infiltration efficiency. Similarly, the warm nights (TN90p) and cold nights (TN10p) exhibit threshold behaviors at 14.79% and 11.74%, respectively. In contrast, the cold spell duration index (CSDI) shows an opposite pattern: values below 11.69 days exacerbate drought, while those above this threshold alleviate it. This can be explained by the fact that shorter cold spells (low CSDI) in winter lead to early soil thaw and enhanced evaporation before spring, intensifying aridification. Conversely, longer cold spells promote the formation of stable snow cover, which preserves water for spring and reduces evaporation losses during the non-growing season.Limitations and prospectsThis study has several limitations that point to valuable directions for future research. Although the CHM_Drought dataset was employed to alleviate the limitations posed by sparse meteorological station coverage, uncertainties in data representation may still persist at small spatial scales. Furthermore, the temporal scope of the data is limited to records up to 2018, excluding more recent years that could provide additional insights. In terms of climate indicators, the selection of extreme climate indices did not include weak precipitation measures, which hold relevance for understanding drought mechanisms and should be incorporated in subsequent studies.Additionally, the use of annual-scale data may obscure important seasonal or sub-seasonal variations in meteorological drought (SPEI) and extreme climate events. Given the strong seasonality typically exhibited by droughts and extreme climate phenomena, future analyses would benefit from adopting a seasonal or monthly analytical framework. While the machine learning models demonstrated strong capability in identifying complex patterns and relationships, their capacity to elucidate the underlying physical mechanisms remains limited. Future work should integrate reanalysis data to explore the synergistic effects of large-scale climate drivers, such as ENSO, on regional extreme climate patterns and drought dynamics.Recommendations for regional water resources managementThe observed warm-drying trend underscores the inadequacy of relying solely on local precipitation to ensure regional water security. In response, the agricultural sector should proactively adjust cropping structures and promote water-saving technologies to mitigate the increasing evapotranspiration pressure driven by persistent warming. Among precipitation-related factors, annual total precipitation (PRCPTOT) is identified as the primary variable alleviating drought, with a contribution rate of 23.35%. It is thus advisable to leverage water storage infrastructure, such as reservoirs, in years with higher PRCPTOT to build strategic reserves and buffer against uncertainties in interannual precipitation variability. Furthermore, the nonlinear thresholds derived from the machine learning model can serve as critical benchmarks for drought early warning. Study recommend that relevant agencies integrate these key drivers into a comprehensive monitoring and risk-alert framework to enhance drought preparedness and adaptive resource management.Conclusions

    (1)

    From 1985 to 2018, the SPEI12 index in the water-receiving area of the Tao River Diversion Project revealed a gradual yet persistent aridification trend, marked by a pivotal transition point around 1988. The summer days (SU) increased significantly, as did the warm days and nights. Conversely, a pronounced decline was observed in cold nights and cold days. Although most extreme precipitation indices exhibited mild increases, their overall trends were not statistically significant.

    (2)

    The relationship between meteorological drought (represented by SPEI) and extreme climate indices is complex and nonlinear. SPEI was negatively correlated with extreme temperature indices but positively correlated with extreme precipitation indices. On shorter time scales, most extreme climate indices exhibited significant resonance periods with SPEI, reflecting the rapid influence of changing weather conditions on drought. In contrast, extreme precipitation indices showed resonance over longer time scales.

    (3)

    The main drivers of meteorological drought were identified as annual total precipitation (PRCPTOT, 23.35%), the cold days (TX10p, 18.93%), and the summer days (SU, 14.75%). Increased annual precipitation significantly alleviates drought conditions, while most extreme climate indices exhibit nonlinear relationships with SPEI.

    Data availability

    The SPEI index was derived from the CHM_Drought dataset (https://zenodo.org/records/14,634,774), a raster dataset with a spatial resolution of 0.1° × 0.1°. This dataset is robust and capable of accurately capturing drought events across mainland China. The SPEI-12 index, covering the period from 1985 to December 2018, was selected for analysis. ArcMap 10.8 was used to crop and mask the downloaded data, enabling the extraction of SPEI data at various scales. Extreme climate data were obtained from the dataset provided by the National Cryosphere Desert Data Center (http://www.ncdc.ac.cn), an NC-format dataset with a spatial resolution of 0.1° × 0.1°.
    ReferencesAbbas, H. & Ali, Z. A novel statistical framework of drought projection by improving ensemble future climate model simulations under various climate change scenarios. Environ. Monit. Assess. 196, 938 (2024).Article 
    PubMed 

    Google Scholar 
    Jiang, Z. et al. Extreme climate events in China: IPCC-AR4 model evaluation and projection. Clim. Change 110, 385–401 (2012).Article 
    ADS 

    Google Scholar 
    Zhong, Y. et al. Seasonal drought classification and its characteristics in the red soil region of southern China. J. Hydrol.: Regional Stud. 60, 102587–102587 (2025).
    Google Scholar 
    Cancelliere, A., Mauro, G. D., Bonaccorso, B. & Rossi, G. Drought forecasting using the standardized precipitation index. Water Resour. Manage 21, 801–819 (2007).Article 

    Google Scholar 
    Vicente-Serrano, S. M., Beguería, S. & López-Moreno, J. I. A multiscalar drought index sensitive to global warming: The standardized precipitation evapotranspiration index. J. Clim. 23, 1696–1718 (2010).Article 
    ADS 

    Google Scholar 
    Shukla, S. & Wood, A. W. Use of a standardized runoff index for characterizing hydrologic drought. Geophys. Res. Lett. 35, 2405-2401–2405-2407 (2008).Article 
    ADS 

    Google Scholar 
    Wells, N., Goddard, S. & Hayes, M. J. A self-calibrating Palmer drought severity index. J. Clim. 17, 2335–2351 (2004).2.0.CO;2″ data-track-item_id=”10.1175/1520-0442(2004)017<2335:ASPDSI>2.0.CO;2″ data-track-value=”article reference” data-track-action=”article reference” href=”https://doi.org/10.1175%2F1520-0442%282004%29017%3C2335%3AASPDSI%3E2.0.CO%3B2″ aria-label=”Article reference 7″ data-doi=”10.1175/1520-0442(2004)017<2335:ASPDSI>2.0.CO;2″>Article 
    ADS 

    Google Scholar 
    Liu, Q. et al. The optimal applications of scPDSI and SPEI in characterizing meteorological drought, agricultural drought and terrestrial water availability on a global scale. Sci. Tot. Environ. 952, 175933. https://doi.org/10.1016/j.Scitotenv.2024.175933 (2024).Article 
    CAS 

    Google Scholar 
    Jehanzaib, M., Shah, S. A., Yoo, J. & Kim, T.-W. Investigating the impacts of climate change and human activities on hydrological drought using non-stationary approaches. J. Hydrol. 588, 125052 (2020).Article 

    Google Scholar 
    Shah, W., Chen, J. & Naseer, S. Climate change and crop yields in Pakistan: A machine learning approach to understanding temperature extremes and drought effects on wheat and rice. Theoret. Appl. Climatol. 156, 536–536. https://doi.org/10.1007/s00704-025-05759-7 (2025).Article 
    ADS 

    Google Scholar 
    Wilson, A. B., Avila-Diaz, A., Oliveira, L. F., Zuluaga, C. F. & Mark, B. Climate extremes and their impacts on agriculture across the Eastern Corn Belt Region of the U.S. Weather Climate Extremes 37, 100467. https://doi.org/10.1016/j.Wace.2022.100467 (2022).Article 

    Google Scholar 
    Jamalzi, A. R., Rahman, G., Akhtar, F., Ikram, Q. D. & Kwon, H. H. Spatiotemporal assessment and trend analysis of meteorological drought in Afghanistan (1974–2023) using SPI and SPEI indices. J. Hydrol.: Regional Stud. 61, 102711–102711. https://doi.org/10.1016/j.Ejrh.2025.102711 (2025).Article 

    Google Scholar 
    Hui, Y. & Fuqing, B. Study on spatiotemporal evolution and driving forces of meteorological drought in the Yellow River Basin considering spatial heterogeneity. China Rural Water Hydropower 2, 74–80 (2025).
    Google Scholar 
    Qu, Z. C., Huang, S., Liu, S., Wang, L. & Liu, D. Extreme climate and its impact on hydrological drought in the Xilin River Basin of Inner Mongolia over the past 60 years. Acta Ecol. Sin. 45, 5386–5397 (2025).
    Google Scholar 
    Sifang, F. et al. Climate change impacts on concurrences of hydrological droughts and high temperature extremes in a semi-arid river basin of China. J. Arid Environ. https://doi.org/10.1016/j.Jaridenv.2022.104768 (2022).Article 

    Google Scholar 
    Zhang, Q. et al. A new high-resolution multi-drought-index dataset for mainland China. Earth Syst. Sci. Data 17, 837–853 (2025).Article 
    ADS 

    Google Scholar 
    Donat, M. G. et al. Updated analyses of temperature and precipitation extreme indices since the beginning of the twentieth century: The HadEX2 dataset. J. Geophys. Res.: Atmospheres 118, 2098–2118 (2013).Article 
    ADS 

    Google Scholar 
    Sun, W. et al. Seasonal prediction of spring drought over Northeast China. J. Appl. Meteorol. Climatol. 64, 1017–1032 (2025).Article 
    ADS 

    Google Scholar 
    Angulo, D. P. et al. Multidecadal changes in hydrological droughts across Sub-Saharan Africa. J. Hydrol.: Regional Stud. 60, 102595–102595 (2025).
    Google Scholar 
    Mann, H. B. Nonparametric tests against trend. Econometrica J. Econometric Soc. 13, 245–259 (1945).Article 
    MathSciNet 

    Google Scholar 
    Kendall, M. G. Rank Correlation Methods. (1975).Sen, P. K. Estimates of the regression coefficient based on Kendall’s tau. J. Am. Stat. Assoc. 63, 1379–1389 (1968).Article 
    MathSciNet 

    Google Scholar 
    Swagatika, S. S., Ashok, M., Chandranath, C. & Bhabagrahi, S. Climate-changed versus land-use altered streamflow: A relative contribution assessment using three complementary approaches at a decadal time-spell. J. Hydrol. https://doi.org/10.1016/j.Jhydrol.2021.126064 (2021).Article 

    Google Scholar 
    Petchprayoon, P., Blanken, P. D., Ekkawatpanit, C. & Hussein, K. Hydrological impacts of land use/land cover change in a large river basin in central-northern Thailand. Int. J. Climatol. 30, 1917–1930 (2010).Article 

    Google Scholar 
    Yanlin, L., Yi, H., Yaru, Z. & Liping, J. Spatiotemporal evolutionary analysis of rainfall erosivity during 1901–2017 in Beijing. China. Environ. Sci. Pollution Res. 29, 2510–2522. https://doi.org/10.1007/s11356-021-15639-y (2021).Article 

    Google Scholar 
    ZHANG, J. et al. Analysis of extreme precipitation changes in Lijiang River Basin based on ITA and Mann-Kendall methods. Journal of Hohai University(Natural Sciences) 52, 15–22 (2024).Shah, S. A., Jehanzaib, M., Kim, M. J., Kwak, D. Y. & Kim, T. W. Spatial and temporal variation of annual and categorized precipitation in the Han River Basin, South Korea. KSCE J. Civ. Eng. 26, 1–12. https://doi.org/10.1007/s12205-022-1194-y (2022).Article 

    Google Scholar 
    Shah, S. A., Jehanzaib, M., Yoo, J., Hong, S. & Kim, T.-W. Investigation of the effects of climate variability, anthropogenic activities, and climate change on streamflow using multi-model ensembles. Water 14, 512 (2022).Article 

    Google Scholar 
    Ali, S. S., Muhammad, J., Woon, P. K., Sijung, C. & Woong, K. T. Evaluation and decomposition of factors responsible for alteration in streamflow in lower watersheds of the Han river basin using different Budyko-based functions. KSCE J. Civ. Eng. 27, 903–914. https://doi.org/10.1007/s12205-022-0650-z (2022).Article 

    Google Scholar 
    Civil, D. O. et al. Investigating the impacts of climate change and human activities on hydrological drought using non-stationary approaches. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2020.125052 (2020).Article 

    Google Scholar 
    Zhao, J. et al. Analysis of temporal and spatial trends of hydro-climatic variables in the Wei River Basin. Environ. Res. 139, 55–64. https://doi.org/10.1016/j.envres.2014.12.028 (2015).Article 
    CAS 
    PubMed 

    Google Scholar 
    Douglas, E. M., Vogel, R. M. & Kroll, C. N. Trends in floods and low flows in the United States: Impact of spatial correlation. J. Hydrol. 240, 90–105. https://doi.org/10.1016/s0022-1694(00)00336-x (2000).Article 
    ADS 

    Google Scholar 
    Swagatika, S. S., Bhushan, K. S., Ashok, M. & Chandranath, C. Sensitive or resilient catchment?: A Budyko-based modeling approach for climate change and anthropogenic stress under historical to CMIP6 future scenarios. J. Hydrol. https://doi.org/10.1016/j.Jhydrol.2023.129651 (2023).Article 

    Google Scholar 
    Alemu, M. M. et al. Spatiotemporal analysis of drought characteristics across multiple timescales in the upper Blue Nile basin, Ethiopia. Theoretic. Appl. Climatol. 156, 435–435 (2025).Article 
    ADS 

    Google Scholar 
    Wang, M., Chen, Y., Li, J. & Zhao, Y. Spatiotemporal evolution and driving force analysis of drought characteristics in the Yellow River Basin. Ecol. Ind. 170, 113007–113007 (2025).Article 

    Google Scholar 
    Nourani, V., Tootoonchi, R. & Andaryani, S. Investigation of climate, land cover and lake level pattern changes and interactions using remotely sensed data and wavelet analysis. Ecol. Inform. 64, 101330 (2021).Article 

    Google Scholar 
    Abdul, F. M. et al. Implications of rainfall variability on groundwater recharge and sustainable management in South Asian capitals: An in-depth analysis using Mann Kendall tests, continuous wavelet coherence, and innovative trend analysis. Groundwater Sustain. Develop. 24, 101060 (2024).Article 

    Google Scholar 
    Yang, K., Ma, Y. & Tang, Q. Visualisation of key thresholds of crop production potential and their future distribution patterns in china under climate change scenarios. J. Geovisual. Spatial Analysis 9, 33–33 (2025).Article 

    Google Scholar 
    Zhang, F. et al. Simulation and explanatory analysis of dissolved oxygen dynamics in Lake Ulansuhai, China. J. Hydrol.: Regional Stud. 57, 102109–102109 (2025).
    Google Scholar 
    Villasante, A., Serrano, Á. F., Sequera, C. O. & Hermoso, E. Methodology for stiffness prediction in structural timber using cross-validation RMSE analysis. J. Build. Eng. 107, 112767–112767. https://doi.org/10.1016/j.Jobe.2025.112767 (2025).Article 

    Google Scholar 
    Alexander, D. L., Tropsha, A. & Winkler, D. A. Beware of R 2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J. Chem. Inf. Model. 55, 1316–1322 (2015).Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    Li, M., Wang, X., Sun, D., Ma, Y. & Feng, X. Analysis of spatiotemporal evolution characteristics of meteorological droughts in Gansu Province based on SPEI. Res. Soil Water Conserv. 32, 191–200 (2025).ADS 

    Google Scholar 
    Chen, S., Men, B., Pang, J., Bian, Z. & Wang, H. Historical and projected extreme climate changes in the upper Yellow River Basin, China. Sci. Rep. 15, 19061–19061 (2025).Article 
    ADS 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 
    Chen, Y., Li, Z., Fan, Y., Wang, H. & Deng, H. Progress and prospects of climate change impacts on hydrology in the arid region of northwest China. Environ. Res. 139, 11–19. https://doi.org/10.1016/j.envres.2014.12.029 (2015).Article 
    CAS 
    PubMed 

    Google Scholar 
    An, D. et al. Evidence of climate shift for temperature and precipitation extremes across Gansu Province in China. Theoret. Appl. Climatol. 139, 1137–1149. https://doi.org/10.1007/s00704-019-03041-1 (2020).Article 
    ADS 

    Google Scholar 
    Haojing, C. et al. Spatial patterns of climate change and associated climate hazards in Northwest China. Sci. Rep. 13, 10418–10418. https://doi.org/10.1038/s41598-023-37349-w (2023).Article 
    CAS 

    Google Scholar 
    Jin, H. et al. Spatiotemporal evolution of drought status and its driving factors attribution in China. Sci. Total Environ. 958, 178131–178131. https://doi.org/10.1016/j.Scitotenv.2024.178131 (2025).Article 
    CAS 
    PubMed 

    Google Scholar 
    Download referencesFundingThis study was supported by the Gansu provincial key research and development program (23YFFA0018); Gansu Province Water Conservancy Research and Planning Project(23GSLK044).Author informationAuthors and AffiliationsSchool of Civil and Hydraulic Engineering, Lanzhou University of Technology, Lanzhou, 730000, ChinaHuimin Hou, Di Lu, Junxing Bai, Feng Guo, Changjie Chen, Zhiqiang Bao & Haohao LiState Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, 730000, ChinaDongmeng ZhouGansu Provincial Hydrology and Water Resources Center, Lanzhou, 730000, ChinaJunde WangGansu Academy for Water Conservancy, Lanzhou, 730000, ChinaYufei ChengAuthorsHuimin HouView author publicationsSearch author on:PubMed Google ScholarDi LuView author publicationsSearch author on:PubMed Google ScholarDongmeng ZhouView author publicationsSearch author on:PubMed Google ScholarJunxing BaiView author publicationsSearch author on:PubMed Google ScholarFeng GuoView author publicationsSearch author on:PubMed Google ScholarChangjie ChenView author publicationsSearch author on:PubMed Google ScholarJunde WangView author publicationsSearch author on:PubMed Google ScholarYufei ChengView author publicationsSearch author on:PubMed Google ScholarZhiqiang BaoView author publicationsSearch author on:PubMed Google ScholarHaohao LiView author publicationsSearch author on:PubMed Google ScholarContributionsConceptualization, H. M. H. and D. M. Z.; Data curation, Z. J. B. and H. H. L.; Formal analysis, J. X. B.; Investigation, J. D. W., Y. F. C. and F. G.; Methodology, H. M. H. and D. L.; Project administration, H. M. H., D. M. Z; Software, D. L. and C. J. C.; Supervision, J. D. W.; Validation, J. X. B., C. J. C., F. G. and H. H. L.; Visualization, D. L.; Writing—original draft, H. M. H and D. L; Writing—review and editing, D. L., D. M. Z., C. J. C. and H. H. L. All authors have read and agreed to the published version of the manuscript.Corresponding authorCorrespondence to
    Di Lu.Ethics declarations

    Competing interests
    The authors declare no competing interests.

    Ethics approval
    There are no ethical issues in choosing the Water-Receiving Area of the Tao River Diversion Project as the study region.

    Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissions
    Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
    Reprints and permissionsAbout this articleCite this articleHou, H., Lu, D., Zhou, D. et al. The response of meteorological drought to extreme climate in the water-receiving area of the Tao river diversion project in China.
    Sci Rep 15, 42077 (2025). https://doi.org/10.1038/s41598-025-26162-2Download citationReceived: 23 August 2025Accepted: 27 October 2025Published: 26 November 2025Version of record: 26 November 2025DOI: https://doi.org/10.1038/s41598-025-26162-2Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy shareable link to clipboard
    Provided by the Springer Nature SharedIt content-sharing initiative More

  • in

    Analysis of spatiotemporal change characteristics of Poyang Lake from 1984 to 2021 based on GEE

    AbstractThis study analyzed of spatiotemporal change characteristics of Poyang Lake from 1984 to 2021 by the dataset of Landsat series of satellite imagery and JRC Global Surface Water based on the Google Earth Engine platform. The normalized difference water index combined with the Otsu method was used to extract the water area. The results indicated that from 1984 to 2021, the interannual variation of Poyang Lake’s water area presented the characteristics of “fluctuation decline—fluctuation rise—overall decline—overall increase”. Additionally, the lake areas in Yongxiu, Xinjian, Nanchang, and Poyang were the primary regions contributing to Poyang Lake’s overall area changes. The seasonal variation of Poyang Lake is obvious in a year, the area in summer was larger than that in winter. Compared with 1984, 0.03% of the water area of Poyang Lake in 2021 disappeared permanently, and 8.45% of the water area changed from permanent to seasonal. Lake area changes were jointly driven by climate change and human activities. The average annual temperature increases, agricultural irrigation, reclamation of surrounding lakes and water conservancy engineering caused the reduction in lake area. Increased annual precipitation and the implementation of environmental protection policies were the main factors for the increases in lake area.

    Similar content being viewed by others

    Spatial response of water level and quality shows more significant heterogeneity during dry seasons in large river-connected lakes

    Article
    Open access
    10 April 2024

    Spatiotemporal lake area changes influenced by climate change over 40 years in the Korean Peninsula

    Article
    Open access
    11 January 2024

    Seasonal overturn and stratification changes drive deep-water warming in one of Earth’s largest lakes

    Article
    Open access
    16 March 2021

    IntroductionPoyang Lake, the largest freshwater lake in China, plays a critical role in the middle and lower Yangtze River basin due to its seasonal hydrological dynamics, discharge capacity, and water exchange characteristics. It has made significant contributions to flood control and water storage, climate regulation, maintaining ecological balance, and maintaining biodiversity, while also affecting the economic development of surrounding areas1. After entering the 21st century, under the dual influence of global warming and increased human activities, the storage capacity of Poyang Lake has declined and the dryness has intensified, resulting in a series of problems such as the gradual shrinkage of the lake area, frequent occurrence of floods and droughts, ecosystem degradation, and reduction of biodiversity2,3. Since the 18th National Congress of the Communist Party of China, the country has attached great importance to the management and protection of rivers and lakes. The degradation of the water body of Poyang Lake has been effectively alleviated, and the overall situation is a complex situation of slow degradation and gradual recovery. Therefore, it is urgent to conduct long time series research on the spatiotemporal changes of Poyang Lake to reveal and analyze its evolution characteristics, which will help water resources management and social and economic development.In recent years, scholars have conducted numerous studies on the dynamic monitoring of the Poyang Lake area, changes in hydrological characteristics, evolution of aquatic vegetation, and evolution driving mechanisms, providing important scientific basis for the management and protection of Poyang Lake4,5,6,7. Liu Hao took Poyang Lake as the research object, extracted water body information from remote sensing images through the water body index method, and explored the changes in its water area and its causes from 1999 to 20198. Wu Changxue used a series of Landsat remote sensing images from 1973 to 2018 to analyze the changing characteristics of the water area of Poyang Lake during the dry season, and revealed the driving factors of its water area evolution9. Tian Biqing interpreted the Landsat remote sensing images through the multi-band pixel value comparison method, and combined with the measured runoff and land use data to monitor and analyze the changes in the water area of Poyang Lake during the flood season. It was found that the lake area showed a decreasing trend during the flood season from 1977 to 2017, and the main driving factors of this evolution were the increase in runoff and human activities10. Long time series analysis requires the acquisition, storage and processing of a large number of remote sensing images. Due to the low degree of automation of traditional analysis methods, previous studies have problems such as the use of few images, short time span, and time-consuming and labor-intensive research11.With the help of Google’s powerful cloud processing capabilities and its partnership with NASA, GEE users can easily call, process and analyze massive amounts of remote sensing satellite (such as Landsat, Sentinel, etc.) image data and other earth observation data online, providing new solutions for long time series remote sensing big data research and has been widely used12. Therefore, based on the GEE cloud platform, this paper uses the Landsat series satellite image data from 1984 to 2021 and the Joint Research Centre (JRC) global surface water dataset to analyze the long time series change characteristics of the Poyang Lake water body, in order to provide a theoretical basis for the protection and management of Poyang Lake and promote the sustainable development of economic and social water use.Study area and dataOverview of the study areaPoyang Lake is located in the northern part of Jiangxi Province (28° 22′ N–29° 45′ N, 115° 47′ E–116° 45′ E). The lake extends 173 km from north to south. It receives water from the five major rivers of Xiu River, Gan River, Fu River, Rao River and Xin River, and injects it into the Yangtze River after regulation and storage. It is a typical lake connected to the river (as shown in Fig. 1, the vector boundary of Poyang Lake comes from the National Science and Technology Infrastructure Platform—National Earth System Science Data Center, the administrative division data comes from the China Geographic Information Public Service Platform). Poyang Lake has a subtropical humid monsoon climate, and the amount of runoff entering the lake has obvious seasonality. Generally, April to September is the flood season, and October to March is the dry season. Since ancient times, it has the characteristics of “floods are everywhere, and low water is one line”13.Fig. 1Geographical location of the Poyang Lake. Map created using ArcMap (version 10.6; https://desktop.arcgis.com/en/arcmap/10.6/, Esri, Redlands, CA, USA).Full size imageData sourcesLandsat series satellite imageryThe Landsat series of satellites have continuously observed the Earth for more than 50 years and are the most widely used remote sensing data in long time series research14. Therefore, this paper writes code based on the GEE platform to call the Landsat Collection 2 Tier1 dataset as the remote sensing data source. This dataset has undergone systematic radiation correction, geometric correction, and accuracy correction. The following formula (Formula 1, from the United States Geological Survey (USGS) official website) is also required to convert the original digital quantization value (DN value) into the actual surface reflectance value.$$begin{array}{*{20}{c}} {{rho _{surf}}={M_rho } times DN+{A_rho }} end{array}$$
    (1)
    Where ({rho _{surf}}) is the surface reflectivity; ({M_rho }) is the reflectivity scaling factor; (DN) is the original digital quantization value; ({A_rho }) is the reflectivity offset. For the OLI and TM sensors, follow the surface reflectance band-specific scaling factors provided by the USGS official website. The reflectance scaling factor and reflectance offset are 0.0000275 and − 0.2, respectively. For the MSS sensor, the reflectance scaling factor and reflectance offset of the corresponding band are extracted by reading the metadata file of each image.Due to the limited operational lifespan of the satellite, data covering the study area in some years are missing. For example, Landsat4/5 MSS (service life: 1972–1999) lacks data from 1985 onwards; Landsat5 TM (service life: 1984–2012) lacks data from 1984, 1985 and 2012. In addition, images acquired by Landsat7 ETM+ (operating life: 1999–2021) after May 2003 were not taken into consideration because of missing data stripes due to a failure of the satellite’s onboard scan line corrector. Finally, a total of 891 multispectral images of Poyang Lake were ultimately selected for analysis. These images, acquired between 1984 and 2021 from Landsat MSS, TM, and OLI sensors, exclude the years 1985 and 2012 and have a cloud cover of less than 30%. They were used to calculate the water body area. Detailed parameters are presented in Table 1 and Fig. 2.Fig. 2Number of images from different years.Full size imageTable 1 Parameters of Landsat series satellite remote sensing images.Full size tableJRC global surface water datasetThe JRC global surface water dataset in the GEE cloud repository is generated by more than 4.7 million remote sensing images acquired by Landsat5/7/8 satellites between 1984 and 2021. Each pixel of the dataset is classified as water or non-water by the platform using an expert system classifier. The classifier has been verified by more than 40,000 sample points to have a water body misclassification error of less than 1% and a missed classification error of less than 5%. The classification results are organized into monthly data for the entire period (1984–2021) and two periods (1984–1999, 2000–2021) for dynamic change monitoring15. This paper uses three bands of the dataset: frequency of water body occurrence, changes in water bodies between two periods, and shifts in water body types during the entire period to analyze the spatial dynamic changes in the Poyang Lake waters.Meteorological dataIn order to study the impact of climate on the change of Poyang Lake area, this paper obtained daily observation data from three meteorological stations in different locations around the study area from 1984 to 2021 from the National Meteorological Science Data Center (https://data.cma.cn/), namely Lushan Station, Poyang Station, and Nanchang Station. The geographical distribution of the selected stations is shown in Fig. 1. This dataset contains daily average temperature and daily precipitation. While there is no missing data overall, there are some outliers, specifically manifested as follows: missing values in the data are uniformly filled with “−9999”, and negative values appear in the rainfall data. Such outliers would introduce biases into the subsequent statistical analysis of meteorological data; therefore, we removed the outliers through a year-by-year screening process. After data preprocessing is completed, the temperature data of each station are first averaged station by station and year by year to obtain the annual average temperature of each station; the precipitation data of each station are accumulated station by station and year by year to obtain the annual precipitation of each station. Finally, based on the characteristics of the available meteorological data, spatial interpolation was performed using the Inverse Distance Weighting (IDW) method in ArcGIS software, generating the annual mean temperature and annual precipitation data for the Poyang Lake region from 1984 to 2021.Research methodsCalculation of normalized difference water indexThe Normalized Difference Water Index (NDWI) was proposed by McFeeters16 in 1996 and has been used ever since. It can effectively suppress vegetation information, weaken the influence of bare soil, buildings, etc., and highlight water information17. It is one of the most widely used methods for extracting water bodies based on remote sensing images. Referring to the previous research results, for example, in the study of Ji Mengfei18, a variety of water body index methods were selected to compare the water body extraction effects of Landsat remote sensing images of Poyang Lake in different years and seasons. The results showed that the average overall classification accuracy and Kappa coefficient of NDWI were 0.93 and 0.80, respectively, which is the optimal index for water body extraction in Poyang Lake. Therefore, this paper uses NDWI to extract water bodies. The calculation formula is as follows:$$begin{array}{*{20}{c}} {NDWI=left( {Green – Nir} right)/left( {Green+Nir} right)} end{array}$$
    (2)
    Where: Green is the green light band; Nir is the near infrared band. The corresponding band numbers of the above two bands in different sensors of Landsat satellite are shown in Table 1. Code is written in the GEE platform to complete the NDWI calculation.Calculation of water areaThe code was written based on the GEE platform to calculate the lake water area from 1984 to 2021, which includes four main steps. The specific process is shown in Fig. 3.Screening and preprocessing of image dataThis paper selects the Landsat4/5 MSS Tier1 and Landsat5 TM/Landsat8 OLI surface reflectance (SR) Tier1 datasets on the GEE platform, filters out the Poyang Lake image data with cloud cover less than 30% in the study year. Using the OLI sensor as a benchmark, the green and near-infrared bands of the MSS/TM sensor are spectrally calibrated using a two-step method: spectral response function (SRF) matching and reflectance ratio fine-tuning. This ensures spectral consistency across different sensors in the same band. Meanwhile, performs cloud removal and other processing operations on the acquired images.Synthetic annual NDWISelect the corresponding green light and near-infrared bands according to Table 1, calculate the NDWI value for each pixel of the preprocessed image, and then calculate the median value of each pixel year by year to synthesize annual NDWI data.Calculate threshold and extract water bodiesThis paper combines the Otsu19 method to automatically calculate the segmentation threshold of annual NDWI data. When the NDWI value of a data pixel is greater than the threshold, it is classified as a water body, otherwise the data pixel is classified as a non-water body.Calculate the area of the water bodyCount the total number of water pixels within the boundaries of Poyang Lake and calculate the area per unit pixel was based on the spatial resolution of the image, thereby calculate the annual lake water area value.Fig. 3Flow chart of Poyang Lake water area calculation based on GEE platform and Landsat imagery data.Full size imageThe calculation method for the water area of Poyang Lake in the four seasons of the year is similar to that of the annual lake water area. It is only necessary to set the time filtering conditions involved in the above process according to the month intervals corresponding to different seasons in the specific year (3 months in total) to obtain the lake water area values in the four seasons of the specific year.Results and analysisTemporal dynamic changes in the waters of Poyang lakeAnnual changes in water areaFrom 1984 to 2021, the interannual changes in the water area of Poyang Lake fluctuated significantly (as shown in Fig. 4). Among them, the maximum water area (3281.75 km²) occurred in 1992, while the minimum (1582.99 km²) was recorded in 2018. Analysis using a five-year sliding mean as the local baseline shows that the water area in 1992 was 33.61% higher than the benchmark for the same period, while the water area in 2018 was 31.04% lower than the benchmark for the same period. Further analysis reveals that the changes in the water area of Poyang Lake are closely associated with the implementation years of national wetland protection policies and large-scale water conservancy infrastructure projects. Therefore, the piecewise linear regression method is adopted to roughly divide the variations in Poyang Lake’s water area during 1984–2021 into four phases. First, the period from 1984 to 1996 was characterized by fluctuation and an overall declining trend. Although the water area in 1986, 1988, 1989, 1991 and 1992 increased to a certain extent or even significantly compared with the previous year, and contained the maximum value in the time series, but the lake area exhibited a decreasing trend, with the water area decreasing by 850.30 km2. Following this fluctuation and decline phase, the period from 1996 to 2003 saw fluctuation alongside an overall rising trend, and this stage was generally a period of high water levels for the lake. Specifically, the water area in 1998, 2000 and 2003 all exceeded 3000 km2. The average water area during this period was 273.90 km2 greater than the overall mean of 2427.65 km2. The water area in 2003 increased by 1078.48 km2 compared with 1996. Next came an overall decline period from 2003 to 2013, which was generally a period of dry conditions. The average water area during the period was 221.87 km2 lower than the overall mean of 2427.65 km2, and the overall downward trend was significant. The water area decreased by 1456.22 km2, with an average annual decrease of 121.35 km2. Finally, the period from 2013 to 2021 marked an overall rising trend. Although this stage included the minimum value in the time series, the overall upward trend is significant, the water area increased by 1215.70 km2, and the average annual increase reached 135.08 km2.Fig. 4Changes of Poyang Lake area during 1984–2021.Full size imageSeasonal changes in water areaIn order to deeply analyze the changes in the area of Poyang Lake in four seasons of the year, the water area of Poyang Lake was counted in spring (March to May), summer (June to August), autumn (September to November), and winter (December to February). Due to the missing or poor quality of image data in some seasons in some years, 1984, 1993, 2002, 2009, 2017 and 2021 were finally selected for the seasonal changes in the water area within the year. The results are shown in Fig. 5. Overall, the water area of Poyang Lake shows very obvious changes in abundance and dryness throughout the year: Spring is the rising water period, and the lake water area gradually expands; summer is the high water season, and the lake water area value exceeds 3000 km2 at its maximum; autumn is the receding season, and the lake water area gradually decreases; winter is the dry season, and the lake water area value is less than 1500 km2 at its minimum.Fig. 5Seasonal change of Poyang Lake area.Full size imageSpatial dynamic changes in the Poyang Lake areaDuring the period from 1984 to 2021, the area of permanent water bodies in Poyang Lake was only 34.09%, mainly distributed in Jiujiang City, with small parts scattered in Shangrao City and Nanchang City (as shown in Fig. 6a). As can be seen from Fig. 6b, during the period from 1984 to 2021, changes in the water area of Poyang Lake occurred mainly in Yongxiu County, Xinjian County, Nanchang County and Poyang County, with both a large-scale decrease in area and a relatively obvious increase in area. In terms of water body transfer (as shown in Fig. 6c), during the period 1984–2021, the new and reduced permanent water bodies in Poyang Lake were 1.31% and 0.03% respectively, the new and reduced seasonal water bodies were 7.98% and 0.79% respectively, and 1.43% and 8.45% of the water bodies changed from seasonal to permanent and from permanent to seasonal respectively.Fig. 6Surface water spatial dynamics in Poyang Lake 1984–2021. a Water body occurrence frequency. b Changes in water bodies. c Water body transfer types. Maps created using ArcMap (version 10.6; https://desktop.arcgis.com/en/arcmap/10.6/, Esri, Redlands, CA, USA).Full size imageFactors affecting changes in the water area of Poyang LakeClimate change factorsUnder natural factors, climate change directly affects water cycle changes in lake basins. The increase in temperature will promote the evaporation of lake surface water. The increase in precipitation will directly replenish the water volume of lakes, while increasing the runoff of rivers entering the lakes and indirectly replenishing the water volume of lakes, affecting the water area of lakes. The results of Pearson correlation analysis show that during the period 1984–2021, the water area of Poyang Lake was positively correlated with the annual precipitation in the corresponding years, with a correlation coefficient of 0.175; and negatively correlated with the average annual temperature in the corresponding years, with a correlation coefficient of − 0.156, both indicating that the correlation was not significant. Figure 7a and b also verify this analysis.Fig. 7Climate change in Poyang Lake and relationship between in Poyang Lake and climate during 1984–2021. a The relationship between water area and precipitation. b The relationship between the area of water bodies and temperature.Full size imageHuman activity factorsUnder human factors, human activities can change the water cycle process of lake systems. The reason for the fluctuation of the Poyang Lake water area from 1984 to 1996 may be that agricultural irrigation consumes water into the lake, and on the other hand it may be that the phenomenon of enclosing the lake for aquaculture encroaches on the lake water area. Since 1998, the country has gradually implemented the policy of “returning farmland to lakes”. Based on the actual situation, the Poyang Lake basin has responded positively, resulting in an increase in the area of the lake. Since 2003, the Three Gorges Project located in the upper reaches of the Yangtze River has gradually implemented water storage projects, which has weakened the supporting effect of the lower reaches of the Yangtze River on the Poyang Lake area, which may be the main reason for the overall low and dry state of Poyang Lake from 2003 to 2013. Since 2012, the Jiangxi government has begun to implement an ecological migration policy along Poyang Lake, which mainly includes measures such as prohibiting land reclamation around the lake. From 2003 to 2013, the water area of Poyang Lake has generally shown an increasing trend, which may be related to the emphasis and strengthening of environmental protection efforts.ConclusionBased on the analysis of the temporal and spatial variation characteristics of Poyang Lake water area based on the GEE platform, the results show that in terms of temporal dynamic change, the water area of Poyang Lake fluctuated significantly during the period of 1984–2021, showing the characteristics of “fluctuation decline, fluctuation increase, overall decline, and overall increase”. During the year, the seasonal variation of the water area of Poyang Lake was obvious, with summer being the wet season and winter being the dry season, and the water area in summer was more than twice that in winter. In terms of spatial dynamics, during the period of 1984–2021, the area of Poyang Lake in Yongxiu County, Xinjian County, Nanchang County and Poyang County changed significantly, with a large increase and a large decrease. In terms of water transfer, 1.31% and 0.03% of the new and decreased permanent water bodies in Poyang Lake were respectively, and 8.45% of the water bodies changed from permanent to seasonal. The water area of Poyang Lake is caused by climate change and human activities, and the relationship between lake water area and annual average temperature is negative, and the relationship between lake water area and annual precipitation is positive. Agricultural irrigation, lake aquaculture, and water conservancy projects contributed to the reduction in lake area, and the implementation of environmental protection policies is one of the main factors for the increase of lake water area.This study uses remote sensing technology to analyze the spatiotemporal variations in the water area of Poyang Lake from 1984 to 2021. However, due to limited time and capabilities, some limitations remain. For example, the spatial resolution of Landsat imagery (30 m) limits the identification of small water bodies. Persistent cloud cover during the rainy season in some years of the study area limited the acquisition of sufficient spring and summer imagery, potentially slightly impacting the reliability of core conclusions regarding interannual and seasonal fluctuations in water area. Subsequent research will collaboratively use multi-source remote sensing data (such as high-resolution images, radar images, etc.), utilize the characteristics of different data, improve the accuracy and efficiency of monitoring dynamic changes in water bodies, and establish a comprehensive, multi-time-effective system for monitoring dynamic changes in water bodies.

    Data availability

    All data generated or analyzed during this study are included in this published article.
    ReferencesDai, X., Wan, R. R. & Yang, G. S. Hydrological rhythm changes in Poyang lake and their relationship with river-lake water exchange. Sci. Geogr. Sin. 34 (12), 1488–1496 (2014).
    Google Scholar 
    Fu, D. J. et al. Development of remote sensing cloud computing platforms and their applications in Earth sciences. J. Remote Sens. 25 (1), 220–230 (2021).
    Google Scholar 
    Guo, H., Zhang, Q. & Wang, Y. J. Characteristics, causes, and drought-flood patterns of hydrological changes in the Poyang lake basin. Acta Geogr. Sin. 67 (5), 699–709 (2012).
    Google Scholar 
    Hu, Z. P. & Lin, Y. R. Evolution of aquatic vegetation in Poyang lake over 30 years and its driving factors. Resour. Environ. Yangtze Basin. 28 (8), 1947–1955 (2019).MathSciNet 

    Google Scholar 
    Hu, Z. Y. Monitoring Hydrological Characteristics of Poyang Lake Using Time-series Remote Sensing Data [M.Sc. Thesis]. East China University of Technology, Jiangxi, China (2023).Ji, M. F., Tang, J. & Gao, X. J. Spatio-temporal changes and driving factors of Poyang lake area based on Google Earth engine. J. China Hydrol. 41 (6), 40–47 (2021).
    Google Scholar 
    Jiang, F. Y. Analysis of Water Area Changes in Poyang Lake Based on Long-term Remote Sensing Data [Ph.D. Dissertation]. Jiangxi University of Science and Technology, Jiangxi, China (2023).Le, Y., Liu, J. T. & Wen, H. Spatio-temporal change analysis of Poyang lake water body based on multi-source and multi-temporal images. J. Yangtze River Sci. Res. Inst. 41 (8), 1–8 (2024).
    Google Scholar 
    Liu, H. et al. Dynamic monitoring of Poyang lake area from 1999 to 2019 based on Landsat imagery. J. East. China Univ. Technol. (Natural Sci. Ed. 46 (1), 68–76 (2023).
    Google Scholar 
    Liu, Y. Y. et al. Spatio-temporal characteristics of Taihu lake water area changes from 1984 to 2018 based on Google Earth engine. Chin. J. Appl. Ecol. 31 (9), 3163–3172 (2020).
    Google Scholar 
    McFeeters, S. K. The use of the normalized difference water index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 17 (7), 1425–1432 (1996).Article 

    Google Scholar 
    Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9 (1), 62–66 (1979).Article 

    Google Scholar 
    Pekel, J. F. et al. High-resolution mapping of global surface water and its long-term changes. Nature 540 (7633), 418–422 (2016).Article 
    CAS 
    PubMed 

    Google Scholar 
    Su, L. F., Li, Z. X. & Gao, F. A review of water extraction from remote sensing images. Remote Sens. Land Resour. 33 (1), 9–19 (2021).
    Google Scholar 
    Sun, F. D. & Ma, R. H. Remote sensing monitoring of dynamic hydrological characteristics of Poyang lake. Acta Geogr. Sin. 75 (3), 544–557 (2020).
    Google Scholar 
    Tian, B. Q. et al. Characteristics and driving factors of water area changes in Poyang lake during flood seasons under long-term time series. Res. Soil Water Conserv. 28 (4), 212–217 (2021).
    Google Scholar 
    Wang, W., Ma, L. & Ge, Y. X. Spatio-temporal characteristics and trends of lake changes in Xinjiang from 1986 to 2019. Acta Ecol. Sin. 42 (4), 1300–1314 (2022).
    Google Scholar 
    Wu, C. X. et al. Characteristics and driving factors of water area changes in Poyang lake during dry seasons over the past 40 years. J. Soil Water Conserv. 35 (3), 177–184 (2021).
    Google Scholar 
    Ye, X. C. et al. Impact of climate change and human activities on runoff changes in the Poyang lake basin. J. Glaciol. Geocryol. 31 (5), 59–66 (2009).
    Google Scholar 
    Download referencesFundingThis research was partially sponsored by fund of the president of Gandong University with grant number YZJJ202201.Author informationAuthors and AffiliationsGandong University, Fuzhou, Jiangxi, ChinaHuangao Qiu & Qiuxi ZhangAuthorsHuangao QiuView author publicationsSearch author on:PubMed Google ScholarQiuxi ZhangView author publicationsSearch author on:PubMed Google ScholarContributionsQ. wrote the main manuscript text, Z. prepared Figs. 1, 2, 3, 4, 5, 6 and 7. All authors reviewed the manuscript.Corresponding authorCorrespondence to
    Huangao Qiu.Ethics declarations

    Competing interests
    The authors declare no competing interests.

    Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissions
    Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
    Reprints and permissionsAbout this articleCite this articleQiu, H., Zhang, Q. Analysis of spatiotemporal change characteristics of Poyang Lake from 1984 to 2021 based on GEE.
    Sci Rep 15, 41594 (2025). https://doi.org/10.1038/s41598-025-25435-0Download citationReceived: 16 July 2025Accepted: 21 October 2025Published: 24 November 2025Version of record: 24 November 2025DOI: https://doi.org/10.1038/s41598-025-25435-0Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy shareable link to clipboard
    Provided by the Springer Nature SharedIt content-sharing initiative
    KeywordsLong time seriesPoyang LakeGoogle earth engineNormalized difference water indexSpatiotemporal change of surface water More

  • in

    Zero-shot generalization for predicting viral concentrations and evaluating removal efficiencies across wastewater matrices

    AbstractPredicting viral particles on new unseen data across wastewater matrices (WMs) in aerobic membrane bioreactor (AeMBR)-based wastewater treatment plants (WWTPs) remains an open challenge due to the process drifts involved in the treatment stages. Efficient data augmentation approaches based on Markov chain (MCM), Markov chain and multivariate Gaussian (MMCM), Gaussian mixture (GMM) and Copula (CM) were proposed to generate synthetic data from physicochemical parameters, virometry, and PCR-based method. Dual-attention long short-term memory network (DA-LSTM) with new generative models was proposed to predict viral particles and evaluate the removal efficiencies across AeMBRs, thereby handling effluent processing drifts. The DA-LSTM combines attention mechanisms to adaptively adjust the weights of the features and increase the long-term memory, enabling accuracy and robustness across unseen WMs. DA-LSTM framework was tested for predicting pepper mild mottle virus and enteric viral pathogens such as total virus and adenovirus in two regions of Saudi Arabia. The log removal values were evaluated through the estimated viral concentrations. The DA-LSTM model demonstrated significant adaptability to unseen data across different WMs, maintaining robust performance despite the effluent drifts. The results showed that DA-LSTM zero-shot generalization achieved remarkable viral particles prediction performance using MMCM with a mean average coefficient of determination R2 of 0.91, and 0.97 across the sand, and MBR wastewater matrices in region R1, respectively, and R2 of 0.97 across the chlorinated effluent treatment process in region R2. Tests on total viral prediction across municipal WWTPs located in two other regions in Saudi Arabia confirmed the DA-LSTM’s effectiveness in predicting viral particle across WMs and its ability to enhance zero-shot generalization performance at the regional level.

    Similar content being viewed by others

    Viral particle prediction in wastewater treatment plants using nonlinear lifelong learning models

    Article
    Open access
    06 April 2025

    Flow virometry for water-quality assessment: protocol optimization for a model virus and automation of data analysis

    Article
    Open access
    04 April 2023

    Metagenomics with comprehensive validation as a supplementary tool for QMRA in SanjiangYuan watershed

    Article
    Open access
    01 December 2025

    IntroductionViral particle concentrations in treated wastewater effluents have recently gained widespread attention due to the increased interest in reuse applications to enhance water security and ensure sustainable water management resources1. As the demand for reclaimed wastewater increases, accurately predicting and removing viral particles in wastewater treatment plants (WWTPs) becomes more important for preventing food contamination, reducing health risks, and mitigating ecological impacts. In this context, monitoring viral particles across the treated wastewater effluents and assessing the resulting log removal values (LRVs) between the influent and effluent treatment concentrations are necessary to ensure safe water reuse. However, removing viral particles from reclaimed water is challenging due to their inherently complex nature and smaller size, making them harder to remove than other biological contaminants in conventional wastewater treatment. Wastewater-based epidemiology methods used to quantify bacterial and viral pathogens are often highly specific and time consuming2,3, and do not reflect the measurement of the wastewater process variables in real time (e.g., water samples are collected on-site and then measured in off-site laboratory analysis4,5), thereby limiting their applications. There is an urgent need for efficient and rapid real-time monitoring approaches. Thus, the development of soft sensor algorithms to accurately predict viral particles and assess LRVs for contaminants in AeMBRs is important for supporting plant operators in water resource recovery facilities (WRRFs) and alleviating these issues. Model-based and learning-based approaches have been proposed to characterize the physicochemical and biological interactions between water quality parameters and contaminant concentrations, and to assess these concentrations5,6,7,8,9,10,11. The physicochemical water quality parameters and concentrations measured in aerobic MBRs can be generally monitored and digitally processed continuously with appropriate sampling frequencies. Such bioreactor processes can be mathematically described or identified using ordinary differential equations from system identification techniques, empirical growth and mass conservation principles12. These model-based estimation approaches provide an efficient strategy for determining the variables of interest, including state, parameters, and unknown faults and disturbances7,13,14. However, the identification and model representation of accurate process models, along with the assumptions in the model derivation, such as microorganism concentrations and reaction rates of WWTP systems, are the bottlenecks of model-based bioreactor approaches. Additionally, there is no direct closed-form relationship between microbial/viral contaminants and biomass concentrations, leading to more complex model-based estimation problems. In several MBR processes, it is necessary to include microbial and viral population balance equations to capture mass transfer and cell division, and heterogeneous mixtures of cells/particles15. Building these complex reaction networks can be a challenging problem in modeling and estimation. Data-driven methods circumvent mechanistic models by learning input–output relationships and capturing dominant patterns in WWTPs (see, e.g., 5,8,9,10,16,17,18,19 and references therein). Learning-based methods have proven effective in processing input–output relationships from data and making predictions. In the context of WWTPs, some solutions have been proposed to estimate bacterial and viral concentrations using data-driven models. The studies in10,20 were among the first to propose estimating bacterial concentrations in WWTPs using different ML algorithms and advanced the prediction of microbial contaminants in wastewater dynamic processes. A sliding window neural network-based approach was proposed to predict bacterial concentrations in WWTPs20. Tree-based ML models were proposed to predict bacterial concentrations10. The key contributions in both ML algorithms were to identify the optimal combination of water quality input features for predicting bacterial concentrations, and then to provide an evidence-based strategy for investigating the transferability and generalizability of model-based estimation methods using dominant and minimum features to construct the aerobic membrane bioreactor model. However, the estimation of bacterial concentrations from model-based approaches comes with different challenges, including the observability conditions for nonlinear systems, which are not always guaranteed. Log removal values of the pepper mild mottle virus (PMMoV) and Norovirus GII particles were predicted using neural networks11. Flow virometry using tree-based machine learning models was proposed for rapid estimation of virus particles across various wastewater matrices21. These works advanced the prediction of bacterial and viral concentrations using datasets ranging from limited to representative, while considering the nature of membrane bioreactor (MBR) technologies (e.g., aerobic and anaerobic MBRs), and the appropriate type of ML models. However, they were limited to developing optimal models based on training, validation, and testing; consequently, they did not infer transferability for unseen testing sets, thereby limiting their model generalization capabilities in the presence of process drifts.The lack of standard ML models capable of handling the time-varying characteristics of wastewater effluent matrices (e.g., process and distribution drifts on unseen datasets), hinders the development of effective data-driven models on unseen datasets to overcome model generalization. Few works targeted the model generalization on unseen data for predicting bacterial and viral contaminants across various wastewater matrices and WWTPs. A recent study was conducted to estimate the bacterial concentrations across various AeMBR-based WWTPs using a calibration that relied on an out-of-distribution framework22. The calibration method showed accurate prediction performances on unseen datasets from WWTPs in two regions of Saudi Arabia. The calibration method or out-of-distribution testing22 shares the same core principle of the proposed zero-shot generalization, as the pre-trained model is applied directly to unseen data with a retraining phase or condition before a downstream model enhancement analysis of unseen datasets. These two forms of zero-shot generalization approaches do not rely on a specific adaptation mechanism. In23, the authors proposed a lifelong learning framework that demonstrated excellent improvements in model generalization accuracy by integrating a knowledge-based adaptation mechanism and local ML predictor on unseen test data to predict viral particles across various WWTPs. Despite recent efforts made to enhance predictive modeling throughout the calibration and model knowledge-based adaptation in WWTPs (see, e.g., 22,23), achieving accurate viral particle prediction results and evaluating removal efficiencies on unseen datasets through a source-to-target estimation principle remains challenging, and there is no guarantee of good prediction performance for microbial and viral concentrations with the standard isolated ML models10,24. This highlights the need to improve model generalization during deployment to ensure consistent and reliable performance. The zero-shot generalization (ZSG) framework, combining a synergistic dual-attention mechanism and a neural network model, emerges as an alternative solution to the above challenges, particularly in source-to-target prediction tasks.Attention mechanisms have recently been proposed to significantly improve the original features extracted in deep learning models, thereby alleviating the long-dependency issue of most neural networks, including recurrent neural networks and long short-term memory (LSTM) 25,26,27,28,29. The purpose of the ZSG method based on the attention mechanism is to address the shortcomings of traditional time series prediction methods when they are faced with long-term dependencies and multiple driving sequences. The attention mechanism assigns weights to intrinsic features to build an appropriate ML model in which the assigned weights are transferred to the target prediction tasks30,31,32. These features can then help identify highly distinguishable features in a high-dimensional space, thereby increasing the accuracy of the prediction results and their generalization capabilities with unseen test datasets. ZSG techniques are needed to handle the challenges caused by the rapid shifts in effluent treatment conditions and to adapt effectively to unseen data. The ZSG combines a synergistic dual-attention (DA) mechanism framework with an LSTM model to test the generalization performance of the trained model with unseen datasets25. The key advantage of synergistic DA-LSTM lies in its novel augmented generative models based on the Markov chain process, the global dependency of the effluent data distributions and feature spaces, and the long-term dependency on time series data using LSTM, ultimately improving the predictive modeling of unseen data via ZSG.The present study paved the way for a ZSG technique based on a dual-attention mechanism and a novel generative model to predict total virus concentrations and associated viral particles across AeMBR wastewater matrices for the development of efficient and generalizable soft sensors. The performance of this ZSG framework with four AeMBR-based WWTPs geographically located in four regions (R1, R2, R3, R4) in Saudi Arabia was tested, aiming to predict unseen viral particles across wastewater matrix datasets from each of these WWTPs. The WWTP in region R1 treats a mix of municipal and industrial wastewater, while the WWTP in R2 treats municipal wastewater, respectively. The WWTPs in regions R3 and R4 are divided into two WWTP pilots (A/H) and (P1/P2) and treat municipal wastewater with a process similar to the WWTP in region R1, although with some modifications. The primary goal of this study was to validate the prediction of viral particles on unseen datasets using the DA-LSTM with generative models across the WMs in regions (R1, R2) and to extend it to the pilots (A/H) in region R3 and (P1/P2) in region R4 for further validation and comparison. The LRVs from the estimated influent and effluent viral concentrations were also evaluated. The LRVs results of the AeMBR-based WWTPs in the three regions (R1, R2, R3) showed different virus removal characteristics that were highly dependent on the treatment processes—aerobic treatment (conventional activated sludge), sand filtration, membrane (MBR), and chlorination—and the type of virus, including its size, structure, and morphological characteristics. The key innovation of this work lies in predicting viral particles and evaluating the removal efficiencies using DA-LSTM zero-shot generalization with novel generative models by optimizing the source model across wastewater effluent drifts and WWTPs. The effluent process drifts were handled by fine-tuning the weights and parameters of DA and LSTM. The pre-trained model was built on the primary effluent source and then applied to a downstream zero-shot generalization on the second effluent. The prediction performance on unseen datasets was remarkably preserved, as slight differences in distribution shifts and dynamic changes occurred between the effluent source and target domains. A retraining phase is needed to streamline the viral particles prediction performance on new unseen datasets coming from the third or new clarifier. To the best of our knowledge, DA-LSTM zero-shot generalization framework has not yet been developed to predict viral particles and evaluate removal efficiencies across various wastewater matrices and WWTPs.Materials and methodsThis section presents the DA-LSTM framework for predicting viral particles in new unseen datasets through source and target prediction tasks. The methodology included integrating four generative models to generate synthetic datasets from the measured datasets. It also provided a ZSG framework based on DA-LSTM to quantify the viral particles across various wastewater matrices and to assess the LRVs through the estimated viral particles.Aerobic membrane bioreactor plants and sample collectionAeMBR systems have shown advantages for the reduction of pathogen presence in post-treated MBR wastewater effluent compared to conventional activated sludge processes1. Despite the low particulate and high quality effluents produced by AeMBR systems, a total reduction in pathogens, including viral and microbial, is often not achieved1,33. The present study proposes a data-driven model to quantify viral particle concentrations across various wastewater matrices and to assess the log removal values of the viral species in four pilot AeMBR-based WWTPs. The description of each AeMBR-based WWTP, including its schematical representation and sampling points, is provided in the supplementary material (Texts S1.1–S1.4; Figs. S1, S2, S3, and S4). The water quality samples and viral particle concentrations were collected from AeMBR-based WWTPs geographically located in four different regions (R1, R2, R3, R4) in Saudi Arabia. The WWTPs in R3 and R4 had two pilots (A, H) and (P1, P2) that were geographically located within the same region, respectively. Physicochemical water quality parameters such as pH, total dissolved solid (TDS), electroconductivity (conductivity), total suspended solid (TSS), turbidity, ammonium nitrogen (NH4-N), nitrate nitrogen (NO3-N), nitrite nitrogen (NO2-N), and chemical oxygen demand (COD) concentration were appropriately measured in the four regions (R1, R2, R3, R4) (Table S1, Supplementary Information). Human TV, which reflects overall viral diversity regardless of viral genera and adenovirus (AdV), was chosen as a predictive parameter for enteric viral pathogens. PMMoV was chosen as the viral indicator. For more details related to the equipment and the collection of the initial samples, we refer the readers to 34, which provides a detailed analysis and processing of all the parameters involved in the source-tracking microbial pathogens in WWTPs. Flow virometry and PCR-based methods (RT-qPCR) were used to measure TV, AdV, and PMMoV concentrations (Table 1) in regions R1 (Text S1.1 and Fig. S1, Supplementary Information) and R2 (Text S1.2 and Fig. S2, Supplementary Information), while TV concentrations were measured in regions R3 (Text S1.3 and Fig. S3, Supplementary Information) and R4 (Text S1.4 and Fig. S4, Supplementary Information) for WWTPs (A) and (H), and (P1) and (P2).Table 1 Evaluation of the proposed generative models for the generated influent and aerobic effluent datasets using various quantitative measures, including the log removal value (LRV) for the MODON AeMBR-based WWTP in region R1. The best performance results of the quantitative error measures and LRVs with the lowest and matched values were highlighted in bold, respectively.Full size tableSynthetic generative models and evaluation performancesThe data generative models—Gaussian mixture models (GMMs) (Text S2.1.3, Supplementary Information), Markov chain models (MCMs) (Text S2.1.1, Supplementary Information), extended Markov chain models (MMCMs) (Text S2.1.2, Algorithm S1, Supplementary Information), and copula models (CMs) (Text S2.1.4, Supplementary Information)—were proposed to generate synthetic datasets from the measured datasets of the limited availability of real samples. The limited available data refers to the real input–output samples collected from WWTPs due to low pathogen concentrations. This limited data is used to generate synthetic datasets. The MMCM generative model follows the generative Markov chain proposed in 24 by modifying the probability state transition and adding noise with appropriate mean and standard deviation levels (Algorithm S1, Supplementary Information). A detailed description of these generative models, including their schematical representations and algorithms, is provided in the supplementary material (Section S2.1, Supplementary Information). These data generative models have demonstrated a strong ability to generate realistic data samples and imitate complex systems, including chemical and biological treatment processes for synthetic data augmentation22. We generate approximately 2000 samples, which were reduced to 1800 after applying a contamination level of 0.10 to remove outliers. It is important to note that a representative dataset with a satisfactory ratio between the real and synthetic datasets was generated. This ratio is adequate to provide remarkable prediction and generalization performances of viral particles across wastewater matrices and WWTPs. The generative models were carefully designed to ensure close distribution matching between the synthetic and original samples of the generating datasets, thereby guaranteeing data integrity while avoiding biases and inaccuracies. Qualitative and quantitative evaluation measures were proposed to ensure data integrity while controlling overfitting and avoiding data contamination or biases. First, the evaluation results included qualitative similarity performances based on principal component analysis (PCA) and t-stochastic neighboring embedding (t-SNE) between the real and synthetic datasets. Second, four quantitative metrics were proposed to evaluate the effectiveness of the similarity or dissimilarity performance between real and synthetic generative datasets. These metrics included the maximum mean discrepancy (MMD), Fréchet inception distance (FID), Wasserstein distance (WD), and energy distance (ED). Third, the conventional log removal value was proposed for the first time as a quantitative measure to evaluate the difference in virus concentration of untreated and treated water, thereby ensuring the consistency and accuracy of the real and synthetic datasets.A series of experiments to select the architectures and hyperparameters of the generative models to effectively generate synthetic data from the available measurements of water quality and flow cytometry–PCR were conducted. The datasets comprised nine input variables and one or three viral particle output concentrations in R1 and R2 and TV particles in R3 and R4; both input–output variables contained limited real samples for each WWTP (Table S1, Supplementary Information). In the data preprocessing stage, the features were normalized in all cases to ensure data quality and make different features comparable. All values for the viral particle concentrations were converted to the log scale (i.e., log10 VP/L). For each prediction of viral particles in the model development and ZSG performance on unseen testing datasets, we generated approximately 2000 samples for the influent treatment process and wastewater effluent matrices (Table S1). Figure 1 shows the qualitative similarity results based on the principal component analysis (PCA) and t-stochastic neighboring embedding (t-SNE) between the real and synthetic data of the influent and aerobic effluent treatment processes for the MODON AeMBR-based WWTP in region R1. MMCM and CM-based generated samples exhibited a close match to the distribution of the real samples (Fig. 1). MMCM and CM performed well in terms of robustness, computational efficiency, dissimilarity, and discriminability by maintaining a good trade-off between qualitative and quantitative measures, as illustrated in Fig. 1 and Table 1. These results demonstrated the significant advantage of MMCM and CM in ensuring strong similarity performances in generating synthetic datasets that closely match the original datasets and represent true WWTP system variability. Although these generative models rely on a distribution to describe the occurrence of input–output values, MMCM intrinsically preserves the complex temporal dynamics, which is essential when generating large datasets.Fig. 1Similarity results between real and synthetic data: case studies of the influent and aerobic effluent treatment processes for MODON AeMBR-based WWTP in region R1 using PCA and t-SNE plots: (a) PCA of the influent; (b) t-SNE of the influent; (c) PCA of the aerobic effluent; (d) t-SNE of the aerobic effluent.Full size imageTable 1 provides several quantitative evaluation scores—MMD, FID, WD, and ED metrics—to evaluate the proposed generative models and assess the quantitative similarity or dissimilarity performance between the real and synthetic generative datasets. Overall, the MMCM and CM generative models outperformed the MCM, and GMM generative models through the MMD, FID, WD, and ED evaluation measures. In addition, the following conventional log removal value (LRV)$$text{LRV}={text{log}}_{10}left({text{C}}_{text{influent}}right)-{text{log}}_{10}left({text{C}}_{text{effluent}}right)$$which evaluates the difference of virus concentration of untreated and treated water was proposed to ensure the consistency and accuracy of the real and synthetic datasets. The results of the quantitative LRV assessment of the generative models showed that the MMCM-LRVs and CM-LRVs were 0.54, 0.59 for TV, 1.17, 1.17 for AdV, and 1.16, 1.16 for PMMoV, and their corresponding real-LRVs were 0.59, 1.10, 1.35, respectively. Notably, the MMCM and CM generative models showed more similar LRVs to the real datasets than the MCM and GMM generative models and achieved better LRV performance across the TV, AdV, and PMMoV concentrations. These results demonstrated the significant advantage of MMCM and CM in ensuring strong similarity performances in generating synthetic datasets that closely match the original datasets and represent true WWTP system variability.Feature correlationReducing the redundancy between input features or variables in the feature selection stage is crucial to developing consistent and accurate machine learning models. This step was conducted using Pearson’s correlation metric, which analyzes the linear dependency between variables. It is formulated as follows:$$begin{array}{c}r=frac{{sum }_{i=1}^{n}left({x}_{i}-overline{x }right)left({y}_{i}-overline{y }right)}{sqrt{{sum }_{i=1}^{n}{left({x}_{i}-overline{x }right)}^{2}{sum }_{i=1}^{n}{left({y}_{i}-overline{y }right)}^{2}}}#end{array}$$where ({x}_{i}) and ({y}_{i}) are the samples of features (x) and (y), (overline{x }) and (overline{y }) are the mean values of the features (x) and (y). Two input features are highly correlated when (r) is close to 1. Pearson correlation helps identify strong linear relationships between input features and reduces the necessary redundancy between input variables in the data preprocessing stage. Figure S5 illustrates the correlation between the features of the real and generated influent and effluent datasets in region R1.The (r)-value between two input variables of the real data and all generated datasets was not greater than the specific threshold value of (r)=0.99 (Fig. S5), as highlighted in8, to eliminate features in the subsequent model development stage. These correlation results demonstrated that the proposed generative models performed well, which indicates the accuracy of these models in avoiding multicollinearity and preserving the distributions between the original and synthetic generated input features (Fig. S5). Preserving all the input features is particularly important for developing machine learning models in the source domain and achieving zero-shot generalization in the target domain across various wastewater treatment matrices.DA-LSTM zero-shot generalization based on generative modelsThe DA mechanism comprises input and temporal attention layers providing key information to the neural network by assigning and weighting essential features and ignoring the contribution of irrelevant factors25,26,27 (Text S2.3.1, Supplementary Information). The core of the DA-LSTM method is to introduce two attention mechanism stages to address the shortcomings of traditional time series prediction methods when facing long-term dependencies and multiple driving sequences (Text S2.3.1–S2.3.3, Supplementary Information).In the input stage, DA uses the input attention mechanism to dynamically select the external driving sequence that is most relevant to the prediction (Text S2.3.3, Supplementary Information). Specifically, by calculating the correlation with the previous encoder’s hidden state, the model assigns an attention weight to each driving sequence so that irrelevant information in the input sequence is suppressed, enhancing the model’s ability to focus on useful features. In the time stage, DA further processes the encoder’s hidden state through the time attention mechanism. This mechanism relies on the current state of the decoder to weigh the encoder’s hidden state at each time step to produce a weighted context vector that contains the most relevant long-term dependency information in the time series for the decoder to generate the final prediction. The introduction of the dual-stage attention mechanism, along with the Markov chain and random walk generative model, enables the model to effectively select important input features while fully capturing the long-term temporal dependencies of time series data, thereby improving prediction accuracy.The description of the DA-LSTM framework and the attention mechanisms and parameters involved, as well as its algorithm, are detailed in the supplementary material (Section S2.3, Supplementary Information). The DA-LSTM algorithm, based on the generative models, is provided in Algorithm S2, and its schematic representation, based on the MMCM generative model and its architecture, is depicted in Fig. 2.Fig. 2Schematic representation of the DA-LSTM based on the novel MMCM generative model. The flowchart includes three main parts: MMCM, DA-LSTM algorithm, and zero-shot generalization. The MMCM part is mainly divided into three sections: (a) Added Gaussian noise: data enhancement by adding a Gaussian noise distribution to the real data; (b) Data discretization: discretize the data and divide it into buckets; (c) Markov chain generator: construct a Markov chain to generate data through a random walk. The DA-LSTM part is mainly divided into two sections: (d) input attention: assigning weights to the input features of the current time step and generating a weighted feature vector helps the model select the most important input features at each time step; (e) temporal attention: assigning weights to the encoder hidden states at all time steps and aggregating them into a vector helps the decoder focus on the most relevant time step information in the encoder when predicting or updating the model; (f) Zero-shot generalization: test the generalization performance on the trained model with unseen datasets and evaluate the results through different indicators.Full size imageThe effluent process drifts in the AeMBRs emerged with the underlying dynamic changes, multiple time scales, and distribution shifts on unseen datasets. The developed ML model must handle wastewater process drifts in soft sensor development to ensure globally representative and effective real-time prediction of viral concentrations and removal efficiencies, thereby capturing the complex interdependencies of water quality and viral particles in model development and model generalization tasks. In the DA-LSTM zero-shot generalization framework, the effluent process drifts were handled by fine-tuning the weights and parameters of DA and LSTM. The prediction performance on unseen datasets was preserved as slight differences in distribution shifts and dynamic changes occurred between the effluent source and target domains. Additionally, the model was built on the primary effluent source before a downstream zero-shot generalization on the second effluent. A retraining phase is needed to streamline the viral particles prediction performance on new unseen datasets coming from the third or new clarifier.Model settingFor each model’s development in each region, the baseline effluent treatment process from the first filtration layer and partitioned its dataset into 80% training and 20% testing sets was selected. The first filtration layer was chosen to ensure the robustness of the ZSG across various wastewater effluent matrices and facilitate streamlining of the soft sensor modeling to provide real-time readings on how viral particles and associated viral concentrations would persist in the effluents of AeMBR-based WWTPs and their LRVs in practice. All experiments for the DA-LSTM model were conducted in a Python 3.11.9 environment to predict the viral concentrations from the water quality inputs, with both training and inference performed exclusively on a CPU. The key software components include TensorFlow 2.x for the deep learning architecture, NumPy, Pandas, and scikit-learn for data handling and evaluation, and Matplotlib for visualization.Hyperparameter optimization and model evaluationIn the model training, the mean square error (MSE) of the testing sets was used as the loss function, while for ZSG, the coefficient of determination (({R}^{2})) was chosen as the evaluation metric and objective function for the iterative optimization.$${R}^{2}=1-frac{{sum }_{i=1}^{n}{left({y}_{i}-widehat{{y}_{i}}right)}^{2}}{{sum }_{i=1}^{n}{left({y}_{i}-overline{y }right)}^{2}}, {text{MSE}}=frac{1}{n}{sum }_{i=1}^{n}{left({y}_{i}-widehat{{y}_{i}}right)}^{2}$$The combination of hyperparameters was derived through multiple experiments and validated to yield optimal results, providing strong support for the core structure of the model. The optimal hyperparameter settings of the DA-LSTM and LTSM models in all experiments included the Adam optimizer with a learning rate of 0.001, a batch size of 128, a hidden layer dimension of 64, and a number of training epochs set to 50. The coefficient of determination (({R}^{2})), RMSE, MAE, and MAPE were used to evaluate the performance of the DA-LSTM framework for predicting viral particles.$${text{RMSE}}=sqrt{frac{1}{n}{sum }_{i=1}^{n}{left({y}_{i}-widehat{{y}_{i}}right)}^{2}}, {text{MAE}}=frac{1}{n}{sum }_{i=1}^{n}left|{y}_{i}-widehat{{y}_{i}}right|, {text{MAPE}}=frac{1}{n}{sum }_{i=1}^{n}frac{left|{y}_{i}-widehat{{y}_{i}}right|}{left|{y}_{i}right|} ,$$where ({y}_{i}) refers to the real output at sample (i), (widehat{{y}_{i}}) denotes the predicted output at sample (i), (overline{y }) is the mean value, and (n) is the number of dataset samples.ResultsThis section describes the case studies conducted to predict total virus (TV), adenovirus (AdV), and PMMoV particles using the DA-LSTM framework and assesses the estimated LRVs across the wastewater matrices located in four regions (R1, R2, R3, R4) of Saudi Arabia, according to the optimal hyperparameters provided in the Materials and Methods section. For each ZSG process, we built the baseline model from the first effluent clarifier, and then we performed ZSG across the second and third clarifiers from its source model (i.e., baseline model). To assess the LRV in each experiment, we developed an influent model corresponding to the baseline model following the conventional LRV calculation. The discussion of the ZSG performance is mainly focused on the coefficient of determination ({R}^{2}) values, but we also utilized the root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) to assess the performance from different angles.Target viral particle prediction across wastewater matrices in region R1
    The AeMBR-based WWTP located in region R1 includes a primary clarifier influent and the aerobic, sand, and MBR treatment processes (Text S1.1 and Fig. S1, Supplementary Information). Using the DA-LSTM approach, we first derived the baseline model (i.e., training and testing) of the aerobic effluent to predict TV, AdV, and PMMoV particles, and then performed zero-shot generalization (with unseen testing datasets) to predict the viral particles in the target effluent sand filtration and assessed the estimated LRVs. Second, we evaluated the effectiveness of the DA-LSTM for predicting the viral particles in the target effluent MBR. Figure 3 illustrates the model development, zero-shot generalization (ZSG) and LRV assessment for predicting TV particles across different wastewater matrices using DA-LSTM algorithms through the proposed generative models.Fig. 3Flowchart illustrating the case study on predicting TV particles for the MODON AeMBR-based WWTP in region R1 in which the model development, zero-shot prediction, and LRV assessment are highlighted: (a) Model development included the training and testing of the viral particles through the influent and aerobic treatment processes; (b) ZSG using DA-LSTM consisting of validating the developed source model from the aerobic effluent datasets to predict TV particles across sand filtration and MBR wastewater treatment matrices (i.e., target unseen datasets); (c) LRV assessment is derived through the estimated viral particle concentrations of the influent and effluent treatment processes.Full size imageAerobic effluent baseline modelOwing to its predictive modeling and zero-shot generalization performance, we implemented the DA-LSTM to predict the source model for TV, AdV, PMMoV, and MS2 particles from the generated datasets in region R1. This primary model development stage was crucial for comprehensively evaluating the baseline model (i.e., source model) and comparing the proposed synthetic generative models in region R1. The aerobic effluent, corresponding to the first filtration layer, was chosen as a baseline model to facilitate streamlined zero-shot generalization. DA-LSTMMMCM and DA-LSTMCM demonstrated superior testing performance for TV, AdV, and PMMoV, with ({R}^{2}) values of 0.99, 0.99, 0.99 and 0.99, 0.98, 0.98, respectively (Table 2). These results are consistent with those in the model generation performance (see Materials and Methods), confirming that the MMCM and copula (CM) generative models closely replicate real data and enhance model development effectiveness. In the model development stage, DA-LSTMMCM provided similar ({R}^{2}) values for the testing performance of DA-LSTMMMCM; however, its data generation performance did not perform well, which affected its zero-shot generalization performance (see Materials and Methods). Overall, these results indicate that DA-LSTMMMCM and DA-LSTMCM models are optimal for balancing data generation and model development and, consequently, are most suitable for zero-shot generalization. In addition, the results of the training and validation loss demonstrated that the predictive modeling does not reflect overfitting to synthetic patterns (Figs. S6, S8, S10, and S12, Supplementary Information).Table 2 Testing performance results of the DA-LSTM based generative models for the aerobic model in region R1 using different performance metrics.Full size tableTarget viral particle prediction for unseen sand effluent datasetsThe zero-shot generalization, often referred to as source and target prediction tasks, involves testing a developed baseline model (i.e., source model) on unseen datasets, where the weights of the developed dual-stage attention baseline model (i.e., source model) are transferred to the target datasets (i.e., unseen testing sets). This testing procedure requires a retraining step or calibration, an adequate network design and fine-tuning of the DA weights and LSTM parameters. Since the aerobic effluent baseline model was used in the model development, the DA-LSTM approach inherits from the global dependency of the underlying data distributions, feature spaces as well as the relationships between features for the source and target effluent datasets to act as a single global model and generalize the viral particle estimates across WMs at the regional level. This process further shows the rationality of constructing the DA-LSTM model of the first filtration layer and generalizing its existing parameter knowledge to handle new unseen tasks, enabling zero-shot prediction of viral particles.In the zero-shot generalization process, we conducted tests to predict TV, AdV, and PMMoV particles in unseen sand effluent filtration datasets using the DA-LSTM approach. DA-LSTMMMCM and DA-LSTMCM achieved remarkable zero-shot generalization performance in each viral community and maintained robust performance across the unseen sand effluent datasets. The ({R}^{2}) values of the DA-LSTMMMCM and DA-LSTMCM models for TV, AdV, PMMoV and MS2 particles were 0.99, 0.80, 0.86, 0.99 and 0.87, 0.88, 0.80, 0.99, respectively (Table 3). In contrast, DA-LSTMGMM and DA-LSTMMCM showed poor prediction performance (Table S2, Supplementary Information). The model development performance results of the aerobic baseline model and DA-LSTM based ZSG on the sand effluent treatment for predicting MS2 particles are illustrated in Fig. S13. The MMCM-generated synthetic data led to a more accurate and robust zero-shot generalization model for all viral particles. These findings demonstrate the reliability of the DA-LSTMMMCM model in predicting viral particles and ensuring zero-shot generalization across the sand filtration process.Table 3 Zero-shot generalization evaluations of the DA-LSTM-based generative models for predicting TV, AdV, PMMoV and MS2 particles on the unseen sand effluent data in region R1 from the corresponding aerobic effluent source models using different performance metrics. The R2 values highlighted in bold were used in the discussion of the generalization performance results.Full size tableTarget viral particle prediction on unseen MBR effluent datasetsWe performed tests on the unseen target MBR datasets to predict TV, AdV, and PMMoV particles from the source aerobic effluent model using the proposed DA-LSTM in the zero-shot generalization process. DA-LSTMMMCM and DA-LSTMCM achieved remarkable zero-shot generalization performance across the MBR treatment process and all the viral communities. The ({R}^{2}) values of the DA-LSTMMMCM and DA-LSTMCM models for TV, AdV, and PMMoV particles were 0.98, 0.94, 0.99 and 0.93, 0.89, 0.89, respectively (Table 4). The results confirmed the significant advantages of DA-LSTMMMCM and DA-LSTMCM models, with DA-LSTMMMCM standing out as the best model, ensuring reliability across the ultrafiltration treatment process. In contrast, DA-LSTMGMM and DA-LSTMMCM showed poor prediction performance, as illustrated in Table S3, Supplementary Information. Using DA-LSTMMMCM and DA-LSTMCM models, the training and testing performance results of the aerobic baseline model, as well as the ZSG on the MBR effluent datasets for predicting TV, AdV, and PMMoV concentrations, are shown in Figs. S7, S9, and S11, respectively. These results demonstrated the advantages of DA-LSTM algorithms in maintaining accurate performance in the presence of multiple time scales and dynamic changes in the effluent treatment processes and distribution shifts involving the wastewater matrix datasets.Table 4 Zero-shot generalization evaluations of the DA-LSTM-based generative models for predicting TV, AdV, and PMMoV particles on the unseen MBR effluent data in region R1 from the corresponding aerobic effluent source models using different performance metrics. The values highlighted in bold were used in the discussion of the generalization performance results.Full size tableLog removal value estimation from target predictionThe LRV was used to evaluate the potential risks associated with reusing treated wastewater effluents. It is a key indicator of the virus removal efficiency in WWTPs and has been well studied in prior research1,35,36. The conventional LRV is calculated as follows:$$begin{array}{c}{text{LRV}}^{text{MD}}={text{log}}_{10}left({text{C}}_{text{influent}}^{text{MD}}right)-{text{log}}_{10}left({text{C}}_{text{effluent}}^{text{MD}}right)end{array}$$
    (1)
    $$begin{array}{c}{text{LRV}}^{text{ZSG}}={text{log}}_{10}left({text{C}}_{text{influent}}^{text{MD}}right)-{text{log}}_{10}left({text{C}}_{text{effluent}}^{text{ZSG}}right)end{array}$$
    (2)
    where ({text{C}}_{text{influent}}^{text{MD}}) and ({text{C}}_{text{effluent}}^{text{MD}}) represent the viral concentrations in the influent and effluent contributed by the model development, respectively. ({text{C}}_{text{effluent}}^{text{ZSG}}) values are the viral concentrations in the effluent contributed by the ZSG. Upon building the aerobic effluent model and performing zero-shot generalization across the effluent sand filtration, the DA-LSTM was trained and tested to predict the viral particles in the influent treatment process. This step is necessary for estimating and monitoring the LRVs of viral particles between the influent treatment and the aerobic and sand filtration processes.Figure 4 shows the estimated, generated, and real LRVs performance results for TV, AdV, and PMMoV concentrations across the aerobic, sand, and MBR wastewater matrices in the MODON AeMBR-based WWTP in region R1. The estimated LRVs obtained from (1)-(2) using DA-LSTMMMCM and DA-LSTMCM models demonstrated excellent prediction performance across all viral particle concentrations and wastewater matrices with the actual generated LRVs. The estimated reductions of TV, AdV, and PMMoV particles achieved by DA-LSTMMMCM were 0.53, 1.17, 1.13 LRVs with the aerobic treatment process, 0.37, 0.44, 0.25 LRVs with the sand filtration, and 0.43, 0.77, 0.85 LRVs with the MBR treatment. The TV, AdV, and PMMoV particles showed 30%, 62%, 78% and 19%, 56%, 25% estimated viral reductions from the aerobic treatment to sand filtration, and the aerobic treatment to MBR treatment, respectively. The aerobic treatment was successful in reducing the AdV and PMMoV loads by a 1-log reduction, which showed precise alignment with the synthetic and ground truth datasets. MBR treatment achieved relatively decent viral retention over time, also aligning with the generated and ground truth datasets. The cumulative LRV sum of TV, AdV, and PMMoV particles over the three treatment processes using DA-LSTMMMCM attained an average of 1.33, 2.38, and 2.23, respectively. The cumulative LRVs of the TV, AdV and PMMoV particles were consistent with the ground truth, which was 1.26, 1.9, and 2.07, respectively (Fig. 4).Fig. 4Estimated and actual log removal values of TV, AdV, PMMoV, and MS2 concentrations for the aerobic, sand, and MBR wastewater matrices of the MODON AeMBR-based WWTP in region R1: (a) LRV of TV particles; (b) LRV of AdV particles; (c) LRV of PMMoV particles; and (d) LRV of MS2 concentrations. The aerobic effluent was chosen as a baseline model in the model development to facilitate streamlined DA-LSTM zero-shot generalization of TV, AdV, and PMMoV across sand and MBR wastewater matrices, and MS2 across sand treatment process. The LRV results were obtained by deriving the influent model in region R1 to assess the viral concentrations and evaluated the difference of virus concentration of untreated and treated water.Full size imageTarget viral particle prediction in region R2
    The pilot AeMBR-based WWTP in region R2, namely (K-WWTP), includes a primary clarifier influent, and two effluents: secondary clarifier (effluent) and chlorinated effluent (Chlor. effluent) treatment processes37 (Text S1.2 and Figure S2, Supplementary Information). We derived the baseline model of the secondary effluent for predicting TV, AdV, and PMMoV particles, which is essential for the ZSG across the chlorination effluent treatment. We then conducted ZSG for the target chlorination effluent to assess the viral particles and estimated the LRVs using DA-LSTMs algorithms. The ZSG accuracy showed remarkable prediction performance across TV, AdV, and PMMoV particles, with an average ({R}^{2}) of 0.97 for DA-LSTMMMCM and DA-LSTMCM (Table 5). The training and testing performance results of the effluent baseline model, as well as the ZSG on the chlorinated effluent datasets for predicting TV, AdV, and PMMoV concentrations using DA-LSTMMMCM and DA-LSTMCM models, are shown in Figs. S14, S15, and S16, Supplementary Information, respectively. These results indicate a clear advantage of DA-LSTM algorithms in predicting viral concentrations on completely unseen chlorination effluent datasets with underlying effluent and chlorination effluent drifts.Table 5 Zero-shot generalization evaluations of the DA-LSTM-based generative models for predicting TV, AdV, and PMMoV on the unseen K-WWTP chlorination effluent (Chlor. effluent) datasets in region R2 from their respective effluent source models using different performance metrics.Full size tableWe evaluated the LRVs based on the estimated models between the influent and effluents. Figure 5 shows the LRVs’ performance results between the estimated and actual values in region R2. The estimated LRVs derived from DA-LSTMMMCM and DA-LSTMCM models demonstrated remarkable performance with the synthetic and ground truth LRVs across all viral particle concentrations. For instance, we observed good agreement between the estimated LRV of the ZSG, the LRV of the synthetic data, and the LRV of the ground truth. The estimated LRV of the TV, AdV, and PMMoV particles achieved with DA-LSTMMMCM were 0.19, 3.25, 1.07 for the secondary clarifier (i.e., effluent), and 0.29, 2.67, 1.60 for the chlorination treatment. The secondary clarifier and chlorination effluent treatments showed lower LRVs for the TV particles, achieving less than a 1-log reduction, while the LRVs’ contributions for PMMoV particles were close to a 1-log reduction. AdV achieved the highest viral retention, with a 3-log reduction for the secondary clarifier and a 2-log reduction for the chlorination effluent. The average cumulative LRVs of TV, AdV, and PMMoV particles over the two treatment processes using DA-LSTMMMCM were 0.48, 5.92, and 2.67, respectively. These results are consistent with the average cumulative LRV sum of TV, AdV, and PMMoV particles in the ground truth and synthetic datasets, which were 0.52, 5.90, 2.55 and 0.59, 6.12, 3.17, respectively, which demonstrating the reliability and robustness of the ZSG using the DA-LSTMMMCM algorithm (Fig. 5).Fig. 5Estimated and actual log removal values of TV, AdV, and PMMoV concentrations for the aerobic, sand, and MBR wastewater matrices of the MODON AeMBR-based WWTP in region R2: (a) LRV of TV particles; (b) LRV of AdV particles; (c) LRV of PMMoV particles. The effluent was chosen as a baseline model in the model development to facilitate streamlined DA-LSTM zero-shot generalization of TV, AdV and PMMoV particles across the chlorinated effluent treatment process. The LRV results were obtained by deriving the influent model in region R2 to assess the viral concentrations and evaluated the difference of virus concentration of untreated and treated water.Full size imageTarget TV particle prediction in region R3
    The pilot AeMBR-based WWTP in region R3 (AH-WWTPs) has two WWTPs, (A) and (H), geographically located in the same region (R3). These two pilots were designed to treat municipal wastewater38. Their processes are similar to those of region R1, with aerobic and sand filtration treatments, although with some modifications (Text S1.3 and Fig. S3, Supplementary Information). Similarly, to the ZSG process in region R1, we constructed the DA-LSTM models for the (A) and (H) WWTPs systems from their aerobic treatments. We conducted ZSG on their respective unseen sand filtration datasets to predict TV concentrations and estimated the LRVs using DA-LSTMs algorithms. We also performed ZSG with a cross-validation between the (A) and (H) WWTPs for TV concentrations and estimated the LRVs using the DA-LSTMMMCM and DA-LSTMCM models.Target TV particle prediction based on sand filtration datasetsWe conducted ZSG using DA-LSTMMMCM and DA-LSTMCM to predict each target TV concentration of the (A) and (H) sand filtration treatments from their respective TV aerobic models. The ZSG evaluations of the two DA-LSTM-based generative models on the unseen TV sand effluents (A) and (H) demonstrated excellent prediction performance across the two sand filtration matrices, with an average ({R}^{2}) of 0.99 in both cases (Table S4, Supplementary Information). Figures S17 and S18 illustrate the training and testing performance results of the TV particles in the aerobic treatment process (A) and the ZSG performance on the unseen sand filtration (A) using DA-LSTMMMCM and DA-LSTMCM algorithms. Both DA-LSTM algorithms were able to track effluent process drifts and showed reliable performance results across their corresponding wastewater effluent matrices. Figure 6 shows the estimated LRV of TV particles contributed by the model development and ZSG approaches across the wastewater effluent matrices for WWTPs (A) (Fig. 6a) and (H) (Fig. 6b) in region R3. We observed good convergence between the estimated LRVs of the model development and ZSG approaches, the LRVs of the generated synthetic data, and the LRVs of the ground truth (Fig. 6). Overall, the estimated LRVs across the two wastewater effluent matrices from the model development and ZSG varied from 0.04 to 0.54 logs, which did not exceed 1-log reduction, and met the LRVs of the ground truth ranging from 0.02 to 0.48 logs. This slight variation in TV removal efficiency across the two wastewater matrices may be attributed to the low contribution of the aerobic and sand treatment processes in the AeMBR-based WWTP.Fig. 6Performance results between the estimated, synthetic, and actual log removal values for the aerobic treatment in the model development and sand effluent in the ZSG of (A) and (H), the WWTPs in region R3: (a) LRV of TV particles for plant (A), the aerobic effluent (A) was chosen as a baseline model in the model development to facilitate streamlined DA-LSTM zero-shot generalization of TV particles across the sand effluent treatment (A); (b) LRV of TV particles for plant (H), the aerobic effluent (H) was chosen as a baseline model in the model development to facilitate streamlined DA-LSTM zero-shot generalization of TV particles across the sand effluent treatment (H). The LRV results were obtained by deriving the influent model in region R3 to assess the viral concentrations and evaluated the difference of virus concentration of untreated and treated water.Full size imageTarget TV particle prediction with cross-validation between (A) and (H) datasetsWe performed a cross-validation ZSG between plants (A) and (H) to assess the effectiveness of the proposed DA-LSTM algorithms in predicting TV concentrations across their respective wastewater matrices. The source models of the two plants were built from the aerobic effluent datasets of each plant. Then, we performed the ZSG of their corresponding aerobic effluent datasets in a cross-validation manner, which indicates that aerobic treatments (A) and (H) were considered in the model development, while aerobic processes for (H) and (A) were used for ZSG, respectively, as shown in Table S5. Figures S19 and S20 show the ZSG performance of DA-LSTMMMCM and DA-LSTMCM on the unseen TV test datasets from their source TV aerobic models in region R3. DA-LSTMMMCM and DA-LSTMCM achieved excellent ZSG performance with ({R}^{2}) values ranging from 0.98 to 0.99 (Fig. S21, Supplementary Information). The estimated LRVs of DA-LSTMMMCM and DA-LSTMCM in the model development and ZSG stages are shown in Fig. 7. Notably, the estimated LRVs were in excellent agreement within the ground truth (Fig. 7), demonstrating the ZSG feasibility for efficiently predicting viral particles and handling effluent treatment process drifts across wastewater matrices from two WWTPs in the same location.Fig. 7Estimated and actual log removal values of TV concentrations based on sand filtration datasets: (a) Target TV particles performance in the aerobic (H) from the source TV particle aerobic model (A), the aerobic effluent (A) was chosen as a baseline model in the model development to facilitate streamlined DA-LSTM zero-shot generalization of TV concentrations across the aerobic effluent (H); (b) target TV particles in the sand (H) from the source TV particle sand model (A) in region R3: the aerobic effluent (H) was chosen as a baseline model in the model development to facilitate streamlined DA-LSTM zero-shot generalization of TV particles across the aerobic effluent (A). The LRVs were determined by deriving the influent model in region R3 to assess the viral concentrations and evaluated the difference of virus concentration of untreated and treated water.Full size imageDA-LSTM zero-shot generalization outperforms the state-of-the-art ML algorithms: Case study in region R4
    To investigate the impact of the DA mechanism on the DA-LSTM framework, we compared the DA-LSTM and state-of-the-art ML algorithms for predicting TV particles to better assess their source-to-target ZSG performance and the impact of the DA mechanism on model fitness. These standard ML included Artificial Neural Network (ANN), Extreme Gradient Boosting (XGBoost), Random Forest (RF), and LSTM methods. The comparison was performed on the new pilot AeMBR-based WWTP designed to treat municipal wastewater in region R4 (P1P2-WWTPs), which has two WWTPs, (P1) and (P2), in the same region R4. The treatments at these WWTPs were similar to the pilot AeMBR in region R3, with aerobic and sand filtration treatment processes (Text S1.4 and Fig. S4, Supplementary Information).Similar to the ZSG analysis in region R3, we built the DA-LSTM model for the baseline aerobic effluent treatment (P1), then we performed ZSG on the unseen aerobic effluent treatment (P2). We used MMCM and CM data generators to generate synthetic data for the model development and ZSG performance of DA-LSTM and standard ML algorithms to predict TV concentrations at (P1) and (P2) WWTPs. The mean square error results for predicting TV particles across the unseen aerobic (P2) from the source model (P1) showed that DA-LSTM outperformed the ANN, RF, XGBoost, and LSTM in region R4 (Fig. S22 and Table S6, Supplementary Information). Figure S21 shows the training, testing, and ZSG performance of the predicted and actual TV values for the LSTM and DA-LSTM models. Overall, DA-LSTMMMCM and DA-LSTMCM achieved excellent ZSG performance results, showing that the predicted values were closely distributed in the trend line and maintained their accuracy and consistency across the aerobic effluent treatment (P2) (Figs. S22, S23a–c, S23d–f, Tables S6 and S7, Supplementary Information).The estimated LRVs of LSTM in the model development and ZSG stages did not meet the ground truth, while the LRVs of DA-LSTMs remained consistent with the real samples (Fig. 8). For instance, estimated LRVs across the two wastewater effluent matrices using LSTM from the model development and ZSG were 0.03 logs for LSTMMMCM and 0.07 logs for LSTMCM for the model development in the aerobic effluent treatment (P1), and 0.11 logs for LSTMMMCM and 0.13 logs for LSTMCM for the ZSG, as illustrated in Fig. 8. The LRVs for LSTMMMCM and LSTMCM achieved average deviation errors of 79.4% and 67.5% from the model development and ZSG to the ground truth LRVs, respectively. These ZSG performance results demonstrate the significant superiority sufficient of DA-LSTM over LSTM in the context of this experiment.Fig. 8Estimated and actual log removal values of TV particles using DA-LSTM and LSTM algorithms for the target aerobic effluent datasets (P2) from the source aerobic effluent model (P1) in region R4: (a) DA-LSTMMMCM and DA-LSTMCM; the aerobic effluent (P1) was chosen as a baseline model in the model development to facilitate streamlined DA-LSTM zero-shot generalization of TV particles across the aerobic effluent (P2); and (b) LSTMMMCM and LSTMCM: the aerobic effluent (P1) was chosen as a baseline model in the model development to facilitate streamlined DA-LSTM zero-shot generalization of TV particles across the aerobic effluent (P2). The LRVs were determined by deriving the influent model in region R4 to assess the viral concentrations and evaluated the difference of virus concentration of untreated and treated water.Full size imageDiscussionThe efficiency of wastewater-based epidemiology techniques based on localized experiments for measuring viral particles is not globally representative and is often compromised due to sampling and analytical methods and, more importantly, differences in experimental systems, which delay the decision-making process. Isolated ML models for predicting viral particles are often limited to model development tasks, preventing model performance enhancement and generalization capabilities on unseen testing datasets from different wastewater matrices and WWTPs. Furthermore, the real-time prediction of viral particles should account for the rapid distribution shifts of the datasets and process drifts in the treatment conditions and variations in the experimentation. These challenges impede subsequent efforts to develop efficient zero-shot generalization methods to estimate viral particles and monitor log removal values across different wastewater matrices in real time.We conducted case studies to predict viral particles using a ZSG approach based on a DA-LSTM algorithm to generalize the model development process across wastewater matrices from different WWTPs in Saudi Arabia. The zero-shot generalization model relies on an extended Markov chain generative model (MMCM), which is inherited from the standard Markov chain model (MCM). The MMCM generative model is a diffusive model that includes noise propagation via the input–output variables with a tendency to move to zero and defines a probability distribution over time. Adding Gaussian noise to the MCM scheme with hierarchical priors generates high-quality synthetic samples that outperform even those with the Gaussian mixture model (GMM) and autoencoders, including generative adversarial networks (Text S2.1, Supplementary Information). Integrating the DA structure into the local LSTM predictor and the MMCM and CM generative models significantly enhanced ZSG performance for predicting viral particles on unseen datasets. Combining these DA, LSTM, and generative models significantly strengthened their abilities to adapt to time-varying characteristics and distribution shifts in WMs. This further highlighted DA’s contribution when faced with process drifts in the wastewater treatment process. The validation of the unseen test sets was based on a direct testing method. It did not intrinsically rely on an adaptation, simplifying the design while reducing the computational time.The DA-LSTM zero-shot generalization model relies on historical physicochemical and flow cytometry–PCR data. These datasets can be collected in a closed-loop operation with controlled and calibrated input variables or open-loop settings without the need for a state feedback mechanism. For instance, the flow rate, shock loads, or chemical dosing input changes can be controlled with fixed values at desired steady state conditions representing the optimal WWTP system state within an operating cycle. These feedback control or optimal control routes are often needed to enhance quality production and energy demands at different levels and to monitor process anomalies in fault diagnosis problems. Herein, the most essential part in the development of a soft sensor capable of predicting viral particles in open-loop or closed-loop settings generally relies on the available measurements of the collected historical data. The latter might contain all possible conditions of the AeMBR-based WWTPs including environmental and chemical changes, and process input variations. The absence of the dynamic inputs does not necessarily mean that the changes of the process inputs were not considered in the historical data. In the model development process, the control input changes, including state control and parametric control, do not affect the prediction and estimation performances of the data-driven soft sensor models. Although multiple time scales and temporal dynamics (i.e., effluent process drifts and time-varying behaviors) were involved across wastewater matrices and WWTPs, the proposed DA-LSTM significantly strengthened its ability to adapt to these time-varying changes in the model performance enhancement by assigning specific weights to the input features that are transferred to the target unseen testing data.DA-LSTM zero-shot generalization performance from various generative models and across different wastewater matricesThe key novelty of the proposed DA-LSTM algorithm lies in the generative models that infer the ability to handle unseen data sets. Four data generative models—GMM, MCM, MMCM, and CM—were proposed to generate synthetic datasets from the measured datasets. We evaluated these models with qualitative and quantitative metrics—including the LRVs across different treatment processes—to assess their performance in generating high-quality samples from various angles (Fig. 1). MMCM and CM greatly contributed to the model development and zero-shot generalization performance. The quantitative evaluation scores MMD, FID, WD, and ED metrics, which measure the effectiveness of the similarity or dissimilarity performance between real and synthetic generative datasets, confirmed the significant advantages of the generative models, with MMCM standing out as the best model. Further, the quantitative evaluation of the LRV across the wastewater matrices confirmed the consistency and accuracy of MMCM in maintaining the LRV of the synthetic data as close as possible to the ground truth (Table 1).In the model development and ZSG, DA-LSTM-based MMCM showed excellent source-to-target viral particles predictions with unseen datasets across various WMs from different WWTPs. The strong generalizability of DA-LSTMMMCM for predicting viral particles was confirmed in three wastewater effluent treatment processes (aerobic, sand, and MBR) in the AeMBR-based WWTP in region R1 (Tables 3 and 4; Fig. 4, Supplementary Tables S2 and S3; Supplementary Figures S7, S9, S11, and S13), the secondary clarifier and chlorinated effluent treatment processes in region R2 (Fig. 5; Table 5; Supplementary Figures S14–S16), and the two plants (A) and (H) of the pilot AeMBR in region R3 (Figs. 6, and 7; Supplementary Figures S17–S20; Supplementary Tables S4 and S5). Further, it was extended to compare the standard ML and DA-LSTM algorithms to assess their source-to-target TV particles performance and the impact of the DA mechanism on model fitness (Fig. 8; Supplementary Figure S23; Supplementary Tables S6 and S7). Overall, the results showed consistent prediction performance of the DA-LSTM algorithm across various generative models and WMs from different WWTPs in different regions.Estimated LRVs across different wastewater matricesThe current study investigated the estimated LRV differences from the target prediction using DA-LSTMMMCM across WMs from different WWTPs. The direct relationship between the estimated viral concentrations and LRV through cost effective water quality measurements allows faster and reasonably real-time monitoring of any deviations from the optimal operating points. The cumulative LRV sum of TV, AdV, and PMMoV particles using DA-LSTMMMCM across the aerobic, sand, and MBR treatment processes in region R1, and across the second clarifier and chlorinated effluent treatments in region R2, were consistent with the ground truth. The prediction consistency was verified in region R3 for predicting TV particles across the sand filtration process and in two pilots, (A) and (H) AeMBRs. The results revealed high consistency between the estimated LRVs and the ground truth values.LRV results across different treatment processes are known to be influenced by different factors, including the type of membrane rejection and biomass concentrations in virus removal. For instance, the model development and zero-shot generalization performance results in all cases showed that the cumulative TV removal efficiency was relatively small across all the WMs, while AdV and PMMoV removal efficiencies were substantially higher for the MBR second clarifier and chlorinated effluent treatment processes in region R2. This shows that TV, AdV, and PMMoV particles responded differently to treatment. TV particles were resilient and demonstrated lower removal efficiency rates, while AdV and PMMoV were more susceptible to the aerobic, MBR, and chlorination treatments. Soft sensor development allowed us to assess the TV, AdV, and PMMoV concentrations across the wastewater matrices based only on wastewater physiochemical parameters, thereby monitoring LRV efficiencies and differences between WMs at the same WWTP sharing the same wastewater source. The core of our findings highlighted the importance of handling wastewater effluent drifts in the soft sensor development to ensure globally representative and effective estimation of viral concentrations and removal efficiencies through indirect physiochemical water quality measurements across WMs, thereby, capturing the complex interdependencies of water quality and viral particles in the model development and zero-shot generalization stages.Zero-shot performance comparison with existing machine learning modelsWe conducted a comparison analysis for predicting TV particles using DA-LSTM and state-of-the-art ML algorithms in region R4. Specifically, we focused on a recent work that proposed Artificial Neural Network (ANN), and Random Forest (RF) models for predicting viral particles11, where the authors predicted log removal values of PMMoV and Norovirus GII concentrations. It is essential to note that the ML algorithms developed in11 do not consistently demonstrate generalization performance across various wastewater matrices. Furthermore, these results do not infer calibration and adaptation mechanisms on unseen test datasets from various wastewater matrices and WWTPs, thereby limiting their performance in the presence of effluent process drifts.LSTM performed better than standard ANN, XGBoost, and RF models; however, standard ML models showed poor ZSG predictions and significant errors on the unseen datasets, limiting their performances in handling effluent treatment process drifts and time-varying characteristics or shifts of wastewater matrices (Figs. S22, S23a–c, S23d–f, Tables S6 and S7, Supplementary Information). Although ANN, XGBoost, RF, and LSTM models performed well on the training and test data (Tables S6 and S7), their performances dropped sharply on the zero-shot generalization tasks, especially when processing new unseen inputs. These standard ML algorithms did not work correctly at this stage, which demonstrating that they still have certain limitations in capturing long-term dependencies and processing dynamically changing data encountered when facing new unseen input sequences. DA-LSTM was significantly better than standard ML models in all training, testing, and validation stages (Tables S6 and S7).ConclusionThe proposed ZSG approach based on a dual-stage attention marks a progressive step toward model generalization across unseen datasets in WWTPs, specifically across wastewater effluent matrices in the geographical region. The synergistic ZSG framework based on DA-LSTM relies on an extended Markov chain generative model to leverage the related issues of adapting and calibrating to new unseen testing datasets, thereby reducing the covariant shift (i.e., process shift) between source and target prediction tasks while improving the predictive modeling performance by continuously updating the DA weights. The results demonstrated the crucial role of integrating DA-LSTM and the Markov chain generative model (MMCM) for accurate prediction of human enteric pathogens and associated viral concentrations, and estimating LRVs across various WMs for rapid monitoring and response in protecting community health. Future work should focus on effectively predicting viral concentrations and virus removal efficiency by streaming the DA-LSTM prediction process under different scenarios and in the presence of abrupt changes across different technologies, including anaerobic MBR plants.The current version of the DA-LSTM zero-shot generalization framework requires a retraining step of the baseline model before a downstream model enhancement analysis on new unseen datasets, specifically when adapting the viral particle prediction across various wastewater matrices. Hence, adaptation is mainly based on a retraining phase similar to the out-of-distribution testing22, which can underscore the online adaptation if constant forgetting factors or periodic intervals are not implemented.In future studies, a transfer learning with an online adaptation phase comprising a receding horizon scheme based on statistical hypothesis testing of the distribution error between the predicted and true values of the viral particles could be considered to alleviate the retraining limitation of the current version of the DA-LSTM. Furthermore, the use of geographical representation and additional datasets, such as differences in industrial, climate, and microbial communities, will be essential to leverage the limited case studies arising from the datasets.Data availability statementThe datasets used and/or analyzed are available from the corresponding author upon request.

    ReferencesHarb, M. & Hong, P. Y. Molecular-based detection of potentially pathogenic bacteria in membrane bioreactor (MBR) systems treating municipal wastewater: a case study. Environ. Sci. Pollut. Res. 24, 5370–5380 (2017).Article 
    CAS 

    Google Scholar 
    Grabow, W. O. K. The virology of wastewater treatment. Water Res. 2, 675–701 (1968).Article 
    ADS 

    Google Scholar 
    Corpuz, A. V. et al. Viruses in wastewater: Occurrence, abundance and detection methods. Sci. Total Environ. 745, 140910 (2020).Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 
    Manti, A. et al. Bacterial cell monitoring in wastewater treatment plants by flow cytometry. Water Environ. Res. 80, 346–354 (2008).Article 
    PubMed 
    CAS 

    Google Scholar 
    Alharbi, M., Hong, P. Y. & Laleg-Kirati, T. M. Sliding window neural network-based sensing of bacteria in wastewater treatment plants. J. Process Control 110, 35–44 (2022).Article 
    CAS 

    Google Scholar 
    Zambrano, J., Krustok, I., Nehrenheim, E. & Carlsson, B. A simple model for algae–bacteria interaction in photo-bioreactors. Algal Res. 19, 155–161 (2016).Article 

    Google Scholar 
    Yang, J. et al. Model-based evaluation of algal-bacterial systems for sewage treatment. J. Water Process Eng. 38, 101568 (2020).Article 

    Google Scholar 
    Ekundayo, T. C., Adewoyin, M. A., Ijabadeniyi, O. A., Igbinosa, E. O. & Okoh, A. I. Machine learning-guided determination of Acinetobacter density in waterbodies receiving municipal and hospital wastewater effluents. Sci. Rep. 13, 7749 (2023).Article 
    ADS 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 
    Alharbi, M. S., Hong, P. Y. & Laleg-Kirati, T. M. Adaptive neural network-based monitoring of wastewater treatment plants. Proc. 2022 American Control Conference (ACC), 3204–3211 (IEEE, 2022).Aljehani, F., N’Doye, I., Hong, P.-Y., Monjed, M. K., & Laleg-Kirati, T.-M. Bacteria cells estimation in wastewater treatment plants using data-driven models. IFAC-PapersOnLine 58, 718–723 (2024). 12th IFAC Symposium on Advanced Control of Chemical Processes.Kadoya, S. et al. A soft-sensor approach for predicting an indicator virus removal efficiency of a pilot-scale anaerobic membrane bioreactor (AnMBR). J. Water Health 22, 967–977 (2024).Article 
    PubMed 

    Google Scholar 
    Bastin, G. On-line estimation and adaptive control of bioreactors (Elsevier, 2013).
    Google Scholar 
    Zambrano, J., Krustok, I., Nehrenheim, E. & Carlsson, B. A simple model for algae-bacteria interaction in photo-bioreactors. Algal Res. 19, 155–161 (2016).Article 

    Google Scholar 
    Dochain, D. State and parameter estimation in chemical and biochemical processes: A tutorial. J. Process Control 13(8), 801–818 (2003).Article 
    CAS 

    Google Scholar 
    Schugerl, K. & Bellgard, K.-H. Bioreactor models in Bioreaction engineering: modeling and control (Springer-Verlag, 2000).Book 

    Google Scholar 
    Farhi, N., Kohen, E., Mamane, H. & Shavitt, Y. Prediction of wastewater treatment quality using LSTM neural network. Environ. Technol. Innov. 23, 101632 (2021).Article 
    CAS 

    Google Scholar 
    Pisa, I., Santin, I., Morell, A., Vicario, J. L. & Vilanova, R. LSTM-based wastewater treatment plants operation strategies for effluent quality improvement. IEEE Access 7, 159773–159786 (2019).Article 

    Google Scholar 
    Mokhtari, H. A., Bagheri, M., Mirbagheri, S. A. & Akbari, A. Performance evaluation and modelling of an integrated municipal wastewater treatment system using neural networks. Water and Environment Journal 34, 622–634 (2020).Article 
    CAS 

    Google Scholar 
    Wang, R. et al. Model construction and application for effluent prediction in wastewater treatment plant: Data processing method optimization and process parameters integration. J. Environ. Manage. 302, 114020 (2022).Article 
    PubMed 
    CAS 

    Google Scholar 
    Alharbi, M., Hong, P.-Y. & Laleg-Kirati, T.-M. Sliding window neural network based sensing of bacteria in wastewater treatment plants. J. Process Control 110, 35–44 (2022).Article 
    CAS 

    Google Scholar 
    Myshkevych, Y., N’Doye, I., Sanchez Medina, J., Aljehani, F., Xiong, Y., Laleg-Kirati, T.-M., & Hong, P.-Y. Combining flow virometry with tree-based machine learning models for rapid virus particle estimation in different wastewater matrices. Water Research, 123905 (2025).Aljehani, F., N’Doye, I., Hong, P.-Y., Monjed, M. K. & Laleg-Kirati, T.-M. A calibration framework toward model generalization for bacteria concentration estimation in wastewater treatment plants. Sci. Rep. 14, 31218 (2014).Article 
    ADS 

    Google Scholar 
    Chen, J., N’Doye, I., Myshkevych, Y., Aljehani, F., Hong, P.-Y., Monjed, M. K., & Laleg-Kirati, T.-M. Viral particle prediction in wastewater treatment plants using nonlinear lifelong learning models. npj Clean Water 8, 28 (2025).Alvi, M., French, T., Cardell-Oliver, R., Batstone, D. & Akhtar, N. Enhanced deep predictive modeling of wastewater plants with limited data. IEEE Trans. Industr. Inf. 20, 1920–1930 (2023).Article 

    Google Scholar 
    Qin, Y., Song, D., Chen, H., Cheng, W., Jiang, G. & Cottrell, G. A dual-stage attention-based recurrent neural network for time series prediction. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2627–2633 (2017).Yoon, N. et al. Dual-stage attention-based LSTM for simulating performance of brackish water treatment plant. Desalination 512, 115107 (2021).Article 
    CAS 

    Google Scholar 
    An, T. et al. Adaptive prediction for effluent quality of wastewater treatment plant: improvement with a dual-stage attention-based LSTM network. J. Environ. Manage. 359, 120887 (2024).Article 
    PubMed 
    CAS 

    Google Scholar 
    Chen, Q., Lin, N., Bu, S., Wang, H. & Zhang, B. Interpretable time-adaptive transient stability assessment based on dual-stage attention mechanism. IEEE Trans. Power Syst. 38, 2776–2790 (2023).Article 
    ADS 

    Google Scholar 
    Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).Article 

    Google Scholar 
    N, Lin et al., Resistive memory-based zero-shot liquid state machine for multimodal event data learning. Nature Computational Science 7, 37–47 (2025).J. Meier et al., Language models enable zero-shot prediction of the effects of mutations on protein function. In Advances in Neural Information Processing Systems, 29287–29303 (2021).Zheng, Y., Zhang, X., Zhou, Y., Zhang, Y., Zhang, T. & Farmani, R. Deep representation learning enables cross-basin water quality prediction under data-scarce conditions. npj Clean Water 8, 33 (2025).Van den Akker, B. et al. Validation of a full-scale membrane bioreactor and the impact of membrane cleaning on the removal of microbial indicators. Biores. Technol. 155, 432–437 (2014).Article 

    Google Scholar 
    Cheng, H., Monjed, M. K., Myshkevych, Y., Wang, T. & Hong, P.-Y. Accounting for the microbial assembly of each process in wastewater treatment plants (WWTPs): study of four WWTPs receiving similar influent streams. Appl. Environ. Microbiol. 90(4), e02253-e2323 (2024).Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 
    Zhang, J., Zhang, J., Sano, D. & Chen, R. Comparison of activated sludge and virus interactions in aerobic and anaerobic membrane bioreactors. iScience 27 (2024).Chaudhry, R. M., Nelson, K. L. & Drewes, J. E. Mechanisms of pathogenic virus removal in a full-scale membrane bioreactor. Environ. Sci. Technol. 49, 2815–2822 (2015).Article 
    ADS 
    PubMed 
    CAS 

    Google Scholar 
    Jumat, M. R. et al. Membrane bioreactor-based wastewater treatment plant in Saudi Arabia: Reduction of viral diversity, load, and infectious capacity. Water 9, 534 (2017).Article 

    Google Scholar 
    Timraz, K., Xiong, Y., Al Qarni, H. & Hong, P. Y. Removal of bacterial cells, antibiotic resistance genes and integrase genes by on-site hospital wastewater treatment plants: Surveillance of treated hospital effluent quality. Environ. Sci: Water Res. Techn. 3(2), 293–303 (2017).CAS 

    Google Scholar 
    Download referencesAcknowledgementsThe authors thank the MODON WWTP operation team for granting us access to various wastewater samples.FundingKAUST-MEWA SPA (REP/1/6112-01-01), and Near Term Grand Challenge (AI) (REI/1/5233-01-01) awarded to Peiying Hong.Author informationAuthors and AffiliationsEnvironmental Science and Engineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi ArabiaJianxu Chen, Ibrahima N’Doye & Pei-Ying HongElectrical and Engineering Program, Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi ArabiaIbrahima N’DoyeCenter of Excellence on Smart Health, King Abdullah University of Science and Technology, 23955-6900, Thuwal, Saudi ArabiaPei-Ying HongFaculty of Science, Biology Department, Umm Al-Qura University, Makkah, Saudi ArabiaMohammad Khalil MonjedAuthorsJianxu ChenView author publicationsSearch author on:PubMed Google ScholarIbrahima N’DoyeView author publicationsSearch author on:PubMed Google ScholarMohammad Khalil MonjedView author publicationsSearch author on:PubMed Google ScholarPei-Ying HongView author publicationsSearch author on:PubMed Google ScholarContributionsJ. C.: Methodology, investigation, software, validation, writing – review and editing. I. N.: Conceptualization, methodology, investigation, visualization, writing – original draft, writing – review and editing, supervision. M. K. M.: Data curation. P.-Y. H.: Conceptualization, supervision, resources, project administration, funding acquisition.Corresponding authorCorrespondence to
    Ibrahima N’Doye.Ethics declarations

    Competing Interests
    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare no competing interests.

    Additional informationPublisher’s noteSpringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary InformationSupplementary Information.Rights and permissions
    Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
    Reprints and permissionsAbout this articleCite this articleChen, J., N’Doye, I., Monjed, M.K. et al. Zero-shot generalization for predicting viral concentrations and evaluating removal efficiencies across wastewater matrices.
    Sci Rep 15, 41726 (2025). https://doi.org/10.1038/s41598-025-26384-4Download citationReceived: 06 July 2025Accepted: 28 October 2025Published: 24 November 2025Version of record: 24 November 2025DOI: https://doi.org/10.1038/s41598-025-26384-4Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy shareable link to clipboard
    Provided by the Springer Nature SharedIt content-sharing initiative
    KeywordsWastewater matricesViral particle predictionEffluent process driftsLog removal valueZero-shot generalizationGenerative modelsDual-attention long short-term memory network More