Detection of untreated sewage discharges to watercourses using machine learning

Shape analysis of 3038 daily flow patterns (2016–2020) for WWTP1 and WWTP2

An example effluent flow pattern for a 10-day period at WWTP1 is shown in Fig. 1. EDM detected spilling intervals are overlaid to demonstrate the flattening effect of spilling on the profile of the flow pattern.

Fig. 1: WWTP1: example effluent flow pattern for 10 days annotated with EDM confirmed spilling intervals.

A 24-h (midnight to midnight) daily flow pattern of 96 15-min-interval average flow rates (litres/second) of treated effluent is shown in blue. The black horizontal, linear annotations represent EDM recorded intervals denoting a discharge from a storm tank (i.e., consented spill or potentially unconsented spill of untreated sewage), the shortest being 15 min and the longest over 24-h. Total daily rainfall (mm/d) is provided in green. The first two days, with no detected spills, show diurnal patterns of low flow between midnight (previous day) and the first peak after mid-morning, followed by a lull until a second, smaller peak in the evening. The next seven days (15/12/18 through 21/12/18) involve spill intervals of various length (black EDM line), showing a flattening of flow, which is typical of storm discharge during heavy rainfall. The last day shows elevated flows and a partial return to a diurnal flow pattern with no spills reported.

Full size image

For WWTP1 (resp. WWTP2), EDM data were available for 446 (resp. 471) consecutive days for 2018–2020 during which untreated sewage spill intervals of varying lengths had been recorded. For each day, spill intervals were aggregated to the total number of hours of discharge. Of the days used for machine learning for WWTP1 (resp. WWTP2), 339 (resp. 346) involved no EDM recorded spilling incidents and 107 (resp. 125) days had spills with various lengths of which over a third were for 24-h. For WWTP1 (resp. WWTP2), 97 (resp. 117) days with an aggregated ‘spill’ length of at least 3-h were labelled as ‘spill’ and 349 (resp. 354) with an aggregated ‘spill’ length of below 3-h as ‘normal’. A 3-h aggregation period was selected because it guaranteed a reasonable number of ‘spill’ days on which to base the supervised learning and preliminary attempts to predict spilling hours per day were weakest for aggregated daily spills under 3-h. Where no EDM data was available, days were labelled as ‘unknown’. The average ‘normal’ (blue line) and ‘spill’ (black line) daily flow patterns as a proportion of storm overflow rates (red line) are shown in Fig. 2 for each WWTP. The storm overflow rates mark the minimum flow that should be treated before untreated sewage spills can be made in compliance with EA permits to discharge to watercourses.

Fig. 2: Average daily flow patterns.

a WWTP1: black curve for ‘spill’ days (n = 97) and blue curve for ‘normal’ days (n = 349); b WWTP2: black curve for ‘spill’ days (n = 117) and blue curve for ‘normal’ days (n = 354).

Full size image

Separate shape models were generated for flow patterns from 2016 to 2020 for WWTP1 (n = 1511) and WWTP2 (n = 1527). The first principle component of shape variation, PCA1, in both models, is associated with magnitude, and temporal shifting of morning flow peak (see Supplementary Video 1.mp4) as well as “seasonal” changes related to daylight saving, public holidays and vacation periods (Supplementary Fig. 1). Despite differences in the population served by WWTP1 and WWTP2, Fig. 3a, b shows similar distributions for scatter plots of PCA1 vs PCA2 for 2121 flows for 2016–2018 without EDM data. Analogous plots of PCA1 vs PCA2 for 917 flows for 2018–2020 with EDM data (Fig. 3c, d) suggest that, for both WWTPs, PCA2 is correlated with shape difference between ‘normal’ flow (open circles) and ‘spill’ affected flow (filled triangles). This spill-related flattening is illustrated by morphing the overall average daily flow pattern for WWTP1 between −1 and +1 standard deviations of PCA2 (Supplementary Video 2.mp4). Interestingly, the area under the receiver-operating characteristics curve associated with using PCA2 alone for ‘normal’/’spill’ discrimination is 0.88 and 0.91 for WTTP1 and WWTP2, respectively (this is the estimated probability of correctly classifying a pair of flow patterns selected randomly, one each, from the ‘normal’ and ‘spill’ labelled subsets).

Fig. 3: PCA1 vs PCA2 for daily flow patterns.

Unknown spill status (grey filled circles); spill confirmed by EDM (filled black triangles); confirmed as normal by EDM (unfilled grey circle) 2016–2018 without EDM data a WWTP1 (n = 1065); b WWTP2 (n = 1056); 2018–2020 with EDM data c WWTP1 (n = 466); d WWTP2 (n = 471).

Full size image

Supervised learning of the effect of sewage spills on 917 effluent flow patterns

The performance of 20-folded cross-validation of supervised learning for labelled flow patterns for WWTP1 and WTTP2 is shown in Supplementary Tables 3 and 4 for 20 support vector machine (SVM) variations while retaining up to 15 PCA modes for flow pattern synthesis. The number of PCA modes retained for shape synthesis affects the validity of the reconstruction of each daily flow pattern and hence classification accuracy. For the three best-performing algorithms, Supplementary Fig. 2 shows the variation in classification accuracy of daily flow patterns for different numbers of retained PCA modes estimated as the average area under the 20 receiver-operating characteristic curves associated with the cross-validation folds. For the optimal classifiers, the average area under the receiver-operating characteristic curve was 0.97 for WTTP1 and 0.96 for WWTP2.

For verification, prior to wider application, the optimal ML classifiers defined for each WWTP were used to reclassify the flow patterns used in their derivation. Figure 4 shows these flow patterns in contiguous temporal sequence with annotations for each day reflecting EDM detected spill intervals (horizontal black segments) and ML confirmation of ‘spill’ (unfilled gold circles). During this period there were 97 (resp. 117) days with an EDM confirmed aggregated spill of at least 3-h at WWTP1 (resp. WWTP2). The agreement between optimal ML classification and spill day labels derived from EDM data was extremely high (WTTP1: sensitivity = 0.91, specificity = 0.95; WTTP2: sensitivity = 0.98, specificity = 0.98), as would be expected for such “training” data.

Fig. 4: Daily effluent flow patterns and event duration monitor (EDM) detected spill intervals at WTTP1 and WWTP2 used as training data (Dec’2018–Mar’2020).

The daily flow and EDM spill data are measured at 15 min intervals. Flow is coloured (orange/blue/pink) to distinguish different years. Black horizontal lines delimit EDM detected spill intervals. Daily flows of aggregated spill length of at least/less than 3-h are labelled as ‘spill’/‘normal’ prior to the supervised learning. Gold circles indicate days classified as ‘spill’ following the training of the machine learning (ML) algorithms to produce an optimal classifier for each WWTP. The grey dashed line represents the storm overflow which defines the minimum sewage flow that should be treated even during storm filling or overflow. Additional annotations are telemetry alarms provided by the operator. These alarms have the potential to corroborate ML predictions of ‘spill’ days for the unseen flow patterns from 2009 to 2018 for which there is no EDM data. Similar charts showing the unseen ML classification of the 2009–2018 daily flow patterns overlaid with rainfall and river level data are provided in Supplementary Figs 5–10.

Full size image

Figure 4 also includes data from other alarms related to untreated sewage discharges that have the potential to corroborate ML flow pattern classification for historical periods without EDM data. For WWTP1, there is near-perfect agreement (Cohen’s kappa: 0.81–1.00) between the EDM, STO (Storm Tank Overflow) and COL (Consented Overflow Level) alarms and ML classification for Feb ‘19–Feb ‘20 (Fig. 4 and Table 2). For just two months, Dec ‘18 and Jan ‘19, the EDM and COL devices concur with near-perfect agreement (Cohen’s kappa = 0.95), the STO device was largely at odds (Cohen’s kappa ≤ 0), and the ML classifier flagged incidents detected by all three. These results suggest that the STO is a good candidate and the COL alarm is an excellent candidate for corroborating ML detected putative spills at WWTP1 when EDM data is unavailable.

Table 2 Agreement of ML classification, EDM, COL and STO alarms for the supervised learning.

Full size table

For WWTP2, there is almost perfect agreement between EDM and COL alarms (Cohen’s kappa = 0.87) and with ML classification (Cohen’s kappa = 0.78) (Fig. 4 and Table 2). No STO alarm data were provided for 2020 and between Dec ‘18 and Dec ‘19 STO showed only chance agreement with other devices and the ML classifier (Cohen’s kappa < 0.1). These results suggest that STO is a poor candidate while COL is an excellent candidate for corroborating ML detected putative spills at WWTP2 when EDM data is unavailable.

Detection of spills in 7160 daily flow patterns (2009–2018) not used to train ML algorithms

The classification of 2121 flow patterns from Jan 2016 to Nov 2018 was considered semi-blinded as they were used in shape analysis but not in the ML “training”, whereas the 5039 flow patterns from 2009 to 2015 were classified fully blinded as they were not used in either. Table 3 summarises the annual number of potential ‘spill’ days detected by the ML algorithms.

Table 3 Number of potential ‘spill’ days detected by machine learning.

Full size table

A subset of 327 ‘spill’ days detected by the ML analysis between 2009 and 2018 at WWTP1 were corroborated by STO or COL alarm data. For the same period, a subset of 128 ‘spill’ days detected at WWTP2 were corroborated by STO or COL alarm data. The COL alarm corroborated all detected spills for which it was available while the unreliability shown earlier for the STO alarm at WWTP2 suggested an alternative approach to corroborate spills detected by ML analysis. For both WWTPs, approximately three additional months of flow and EDM data (87 days between March 7th 2020 to June 1st 2020) were available after the end of the ML training data. This period was omitted from the original ML training data because the daily flow volume at WWTP1 was zero or less than 1% of expectation for more than 50% of the time and hence unusable (Supplementary Fig. 12). Such data anomalies are in any case a breach of the EA permit requirement that only 37 days in total in each year be missing or suspicious. However, it was possible to perform blinded testing of the 87 daily flow patterns from WWTP2 against the classification models constructed for Dec’ 2018–Mar’20 and demonstrate corroborative agreement with the EDM data 93% of the time (Supplementary Fig. 12).

When WWTP1 spilled untreated sewage, whether detected by COL/EDM alarms or by ML classification, it typically did so at an effluent flow rate that was considerably below the storm overflow level (50.52 l/s) stipulated in its EA permit as the minimum flow rate for continued treatment (pass forward flow or PFF). This can also be seen in the 2018–2020 EDM monitored period. A comparison of average ‘spill’ and ‘normal’ flow patterns (Fig. 2) shows that the average effluent flow for ‘spill’ days at WWTP1 is never above the storm overflow rate, whereas at WWTP2 it is always above. Specifically, at WWTP1, 141 of 274 (51.5%) non-aggregated (i.e. individual) spills detected by EDM at WWTP1 start when the effluent rate is less than 80% of the storm overflow rate compared to none at WWTP2 (Supplementary Fig. 11).

Due to the COVID-19 related lockdown from March 2020, permits for both WWTPs valid for the period prior to 2018 could not be provided in response to an EIR request to the Environment Agency because they were not in electronic format and premises were inaccessible. However, for both WWTPs, the current permits, which include historical amendments, suggest that the storm overflow settings have remained unaltered since before 2009. It appears, therefore, that WWTP1 has been spilling ‘early’ for more than 12 years whereas WWTP2 has rarely done so and, even then, only marginally.

ML detection of isolated and contiguous series of 24-h spills

For each WWTP, the daily flow patterns detected by EDM or ML analysis were ordered by the degree of flattening of the flow pattern as measured by the standard deviation of the 96 constituent 15-min interval flow rates. For ML detected ‘spills’ at WWTP1 without EDM data, the 20 most “flattened” daily effluent flow patterns are compared in Fig. 5 to the average dry weather flow. Each flow reflects persistent 24-h spilling at an effluent flow between 60% and 80% of the minimum required. In contrast, the twenty most “flattened” daily flows at WWTP2 without EDM data have an effluent rate greater than or equal to the corresponding storm overflow rate and so are likely to comply with the minimum flow to treatment condition. Nevertheless, two of these “top twenty” 24-h spills at WWTP2 in Fig. 5b, on 05/05/2012 and 12/05/2012, occur on a rainless day following a dry previous 24-h. Therefore, they are likely to be due to groundwater ingress which the EA considers to be unpermitted. It is widely recognised that groundwater ingress into sewer networks does occur, especially in England where many sewerage networks have been in place for more than 100 years (;;; It is difficult to obtain groundwater level data for specific locations and for specific days when spills have occurred. Moreover, the underlying geology for the sewerage networks and sewage pumping stations (SPSs) feeding the two WWTPs in this study varies quite considerably without borehole data local to each SPS.

Fig. 5: The 20 daily effluent flows most flattened by 24-h spilling compared to the average daily dry.

Weather flow For WWTP1, each spill last 24-h during which the effluent rate is between 60% and 80% of the storm overflow rate. For WWTP2, in contrast, the effluent rate is at or above the corresponding storm overflow rate. Also, two 24-h spills (5.5.12 and 12.5.12) are highlighted as “Dry Spills” because there was no rainfall on the day they occurred nor on the previous day.

Full size image

An isolated 24-h spill of untreated sewage covers a complete diurnal sewage cycle and so includes the twin peaks of maximum inflow when spilled sewage dilution is likely to be least and risk of pollution damage greatest. But, worse still, is the pollution potential caused by an unbroken series of 24-h untreated sewage spills during which a receiving watercourse has no respite nor opportunity to recover.

2009–2018 The ML analysis detected over 160 24-h spills at WWTP1, of which 105 were corroborated by STO or COL alarm alerts. Similarly, 200 24-h spills were detected at WWTP2. These involved multiple examples of contiguous 24-h spills of more than 10 days.

At WWTP2, a notable near-continuous ‘spill’ of 60 days was detected by the ML classifier between 21/12/2013 and 22/02/2014 (see Supplementary Figs 8 and 9). Extensive sewage fungus in the receiving watercourse had been reported to the EA (27/01/2014 and 03/02/2014) by a member of the public before the EA visited the works on 06/02/2014 to investigate. The EA Compliance Assessment Report concluded that

“There is extensive sewage fungus over 1.5 km of watercourse with a corresponding negative impact on the aquatic environment. Our fisheries and biodiversity teams are very concerned by the impact which we have classified as an ongoing Category 2 incident”.

No prosecution was made. On more than 20 days during this 60-day spill, rainfall was below 2 mm. Similar series of contiguous 24-h spills were detected by the ML analysis in 2012 (14 days), 2013 (16 days, 8 days), 2015–2016 (17 days). Each of these spills also contained subseries of 2 or more consecutive days without rainfall.

2018–2020 EIR requests established that in 2019, WWTP1 spilled for over 1000 h on 72 days (mean: 15 h/spilling day) including 21 ML detected 24-h spills with contiguous series of 2–11 days; similarly, WWTP2 spilled for over 1390 h on 76 days (mean: 18.3 h/spilling day) including 32 ML detected 24-h spills with multiple contiguous series of 2–14 days. A near-continuous spill at WWTP2, for ~30 days in November 2019, included 14 days during which, at most, 2 mm of rainfall had occurred (see Fig. 4). As was the case in 2014, the spills resulted in extensive sewage fungus that was reported to the EA by a member of the public (Fig. 6).

Fig. 6: Photograph of sewage fungus.

Sewage fungus resulting from 30 day spill of untreated sewage from WWTP2 in November 2019.

Full size image

These long spills and sewage fungal growth involved periods of unexceptional rainfall. Our analysis suggests that, for at least nine years, WWTP2 is likely to have been, and continues to be, subject to groundwater ingress—a driver of sewage spills that the EA considers to be unpermitted.

Source: Resources -

Growing support for valuing ecosystems will help conserve the planet

Visualizing a climate-resilient MIT