### Quantitative variables

The intervention described here relies on the IT platform “SmartH2O” for the collection and visualization of smart meter data, the provision of consumption feedback to the user, the delivery of water-saving recommendations, and the engagement of the consumer through a gamification program^{20,21,34,35}. We embedded a gamification mechanism in the digital platform to maximize user retention and stimulate the exploration and sharing of content and the setting and achievement of personal saving goals. Via the gamification mechanisms, users could collect reward points for different actions performed in the digital platform or the achievement of water-saving targets. Reward points consisted of virtual points that the users could redeem for physical rewards. The design of the SmartH2O digital platform and the behavioral change stimuli that have been introduced in the Valencia case-control study (e.g., web and mobile app, different reward schemes), along with their individual elements and the corresponding illustrative screenshots of the platform are provided in the Supplementary Information, consistently with the information published in a previous study^{21}. Other platforms similar to SmartH2O or approaches for water conservation based on digital technologies are reported in the literature, including, e.g., real-time water consumption feedback on in-home displays, interactive dashboards, and games^{36,37}. Yet, to the author’s knowledge, SmartH2O is the first platform of its kind whose effect is rigorously assessed in the medium term and long term.

#### Household-scale water consumption data and smart meter sampling frequency

Water consumption readings measured at the household scale constitute the main quantitative variable of interest used in this observational study to identify behavior changes. The SmartH2O digital platform relies on water consumption information stored in a central database and enables data communication from the water utility to the water consumers (see Supplementary Fig. 1 for its software architecture). Water consumption data are collected by smart meters installed at the household premises, according to a schedule that considers the maximum available frequency of data sampling at each installation (hourly or daily). The consumption data are anonymized by the utility company, filtered, and transferred to the central database of the SmartH2O platform. The content of the central database is published to the user via a web portal and a mobile application, which are the entry points of all users’ interactions with the platform.

Besides the time series of water consumption, we also stored the sampling frequency allowed by each household-scale smart meter. Two types of sampling frequencies were available in the considered population, depending on the installed smart meter hardware: hourly or daily.

#### Digital user engagement variables

The central database of the SmartH2O platform comprises content for improving user awareness, such as water-saving recommendations, and for implementing the gamification program, such as the description of virtual and physical rewards. The interaction of the users with the platform and the overall user experience features several functionalities, including user login, water consumption and smart meter-based feedback visualization, conservation goal settings, and different gamified water conservation awareness actions (see also Supplementary Notes1). We monitored the activity of each user in the SmartH2O platform for the entire duration of the treatment period and gathered quantitative data on these four *digital user engagement variables*:

- (i)
*Login count*, defined as the total number of logins executed by each user. - (ii)
*Non-rewarded action count*, defined as the total number of actions performed by each user, with no reward points associated. - (iii)
*Rewarded action count*, defined as the total number of actions performed by each user, with associated reward points upon their completion. - (iv)
*Cumulative reward importance*, defined as the total amount of points achieved by each user by completing the rewarded actions. It accounts for the total amount of points, badges, and rewards achieved by an individual user in the SmartH2O platform.

Each user profile in the SmartH2O platform was associated with a unique smart meter ID, which allowed linking the user activity in the platform with the household water consumption data. User confidentiality was maintained throughout the full study as data were anonymized by the water utility managing the water meters and the central database.

### Population and study size

Our observational study was conducted in the city of Valencia, Spain. With a population of 794,288 inhabitants, as reported in 2019 by the Spanish National Institute of Statistics (Institudo Nacional de Estadística)^{38}, Valencia is the third-largest city in Spain. The water utility of Valencia (Global Omnium–EMIVASA) has installed more than 425,000 smart water meters since the early developments in 2006 to monitor the water consumption of nearly all the population^{39} (the last official census data, recorded in 2011, report 419,994 households in total in Valencia^{40}). The total population considered in this study after application of the exclusion criteria described in the next section included 334 individual households, each equipped with a water meter.

The architecture of the smart metering infrastructure deployed in Valencia has been designed in order to be vendor-independent, so it allows for different smart metering solutions to be integrated^{39}. While this is clearly an advantage for procurement, the diversity of hardware has an impact on data sampling and only one of the available technologies supports hourly data collection, which is a preferred requirement for water consumption data quality assessment and provision of sub-daily water consumption information to households in our case-control study. The number of hourly reading meters in Valencia amounts to 168,172 as of July 12th, 2020. EMIVASA also offered its customers access to a web platform where bills and invoices could be managed and also information about the current (daily and monthly) water consumption data was made available.

During our observational study, we integrated the digital SmartH2O platform^{20,21} in the EMIVASA portal. We invited users who already had an account in the platform and a compatible meter reading frequency to voluntarily join our observational study and sign up to the SmartH2O platform. The recruitment campaign was performed using different media channels, namely, newspaper articles on consumer magazines, radio programs, banners on the digital and printed invoices sent to EMIVASA customers, and also a Facebook campaign targeting the Valencia area. At the end of the recruitment campaign, we received 525 applications out of which we obtained a treatment group composed of 223 households after application of the inclusion/exclusion criteria. Out of the households who did not apply to join the case-control study during the recruitment phase, 111 households agreed to be monitored as part of the self-selected control group to be considered as a benchmark group not subject to treatment, after active recruitment via phone by the EMIVASA call center (client service management). Households in the control group had only access to their water consumption data through the already existing platform, which did not offer any type of smart meter-based consumption feedback, behavioral stimuli, and/or gamification elements.

Informed consent was obtained from the households monitored in this study. Moreover, the water utility (Global Omnium–EMIVASA) supervised and approved the collection, usage, and processing of the anonymized quantitative variables above described in compliance with the EU General Data Protection Regulation 2016/679 and the pre-existing Spanish law 15/1999 LOPD of 1999 (the SmartH2O study started before the adoption of the GDPR in 2016).

### Baseline and observation periods

The treatment period of the case-control study lasted 8.5 months, from June 2016 to February 2017. We also continuously collected anonymized water consumption data for the study population from June 2016 to February 2019 both to conduct the longitudinal study presented in this paper and evaluate water consumption changes over time in comparison with a pre-treatment baseline (June 1st, 2015– April 30th, 2016), as well as to compare water consumption changes in the treatment and control groups. Consistently with the months included in the treatment period (short-term behavior change), we identify the observation period June 1st, 2017–February 2nd, 2018 for medium-term behavior change assessment, and the observation period June 1st, 2018–February 2nd, 2019 for long-term behavior change.

### Exclusion criteria

The population considered for analysis of water consumption changes in this observational study was obtained by sequential application of the following exclusion criteria.

- 1.
*Exclusion of empty households*. First, we excluded the households with no data in the baseline and treatment period. We classified in this category also the households with a cumulative water consumption lower than 1.5 m^{3}over the whole baseline and treatment period (which together last nearly 20 months). This threshold value was identified as a conservative choice after consultation with the local water utility and comparison with the average values of water consumption in the entire population (slightly above 0.21 m^{3}/day) and the European average water consumption, which amounts to 128 liters per inhabitant per day (0.128 m^{3}/day)^{41}. A household in the considered population would use ~1.5 m^{3}in one week (0.21 m^{3}/day × 7 days). While lower values than the average consumption are observed in those days in which the inhabitants spend little time at home, a cumulative consumption of 1.5 m^{3}over the course of more than 1 year can indicate that the house is generally empty (and possibly the observed water consumption is due to leaks). - 2.
*Exclusion of households with insufficient data length*. We removed the households with water consumption readings for less than 1000 h (approximately 6 weeks). This step guarantees a minimum representation of weekend/weekday water demand variation for more than 1 month (please note that the total duration of the treatment period is 8.5 months). - 3.
*Exclusion of partially empty households*. We excluded the households with more than 90% water consumption readings equal to zero in the baseline or observation period or completely lacking data for one of these two periods. We considered these households to be empty or equipped with faulty meters at least during one of the two short-term periods of interest. The above value threshold of 90% was identified with a trial-and-error procedure and expert-based data analysis that balance the rate of exclusion with the size of the remaining dataset. - 4.
*Exclusion of households lacking day-of-week representation*. We excluded the households with available observations for less than 7 unique day types, to guarantee a minimum representation of water consumption routines that depend on the day of the week. For those households with smart meters recording water consumption with hourly sampling frequency, we removed days with more than 4 h of gaps from the smart meter time series (anomalous meter data logging). - 5.
*Exclusion of households with anomalous high water consumption*. We considered hourly water consumption readings larger than 1 m^{3}as outliers (we thus removed these hourly readings) and we removed the households with a daily average water consumption larger than 1 m^{3}in at least one phase of the longitudinal study. High values of water consumption can be observed for specific days (e.g., when customers use water for outdoor irrigation or filling up a pool), yet average daily water consumption values over the selected threshold are more than three times higher than the European average (equivalent to approximately 0.3 m^{3}/day per household). We did not apply more restrictive thresholds, in order not to bias our analysis and avoid unjustified exclusion of high water consumers. - 6.
*Exclusion of households with unrealistic short-term consumption change levels*. We excluded the households with extreme values of short-term consumption change during the treatment period, which were identified as outliers by Tukey’s fences^{42}. According to Tukey’s fences, a data point*x*_{i}is considered an outlier if:$$x_i notin [Q_1 – kleft( {Q_3 – Q_1} right),Q_3 + k(Q_3 – Q_1)]$$

(1)

where

*Q*_{1}is the 25th empirical quartile (i.e., 25% of the data is lower than this point) and*Q*_{3}is the 75th empirical quartile (i.e., 75% of the data is lower than this point), and*k*= 1.5. Tukey’s fences with*k*= 1.5 approximate the 99.7% confidence interval defined for normal distributions by a distance of three standard deviations from the mean. - 7.
*Exclusion of households with anomalous conditions in medium-term and long-term*. We excluded 51 households that met the above exclusion criteria 1–6 during either the medium-term or long-term observation periods. Water consumption change patterns would be incomplete/anomalous for these households, with at least one missing/anomalous period out of the four periods of interest (i.e., baseline, treatment period, or following observation periods in 2018 and 2019).

With the above exclusion criteria, we obtained the 334 households considered for behavior change analysis in this observational study. More details on the population size after application of each exclusion criteria are reported with a flow diagram in Supplementary Fig. 10^{43}. It is worth noting that only less than 2% of high consumption households have been excluded, while most of the other excluded households had insufficient data or unrealistically low consumption levels. Also, the number of households in the sample considered here differ from those considered in the evaluation of the SmartH2O project^{44}, due to the different temporal length of the two studies and the application of the exclusion criteria on data recorded in different periods (the SmartH2O project only included the baseline and treatment periods).

Adopting the same criteria to exclude households from the behavior change analysis only during the summer period (Fig. 1d) resulted in a reduced population of 179 households (101 households in the treatment group and 78 households in the control group), due to limited data availability for the summer period. Similarly, a subset of 198 households in the treatment group was considered for the correlation analysis by logistic regression (Fig. 4), as the excluded 25 households presented incomplete smart meter data or incomplete information on their usage of the digital SmartH2O application.

### Data analysis and statistical methods

We performed customer segmentation to analyze heterogeneous long-term behavior change patterns (Fig. 3 and Supplementary Fig. 9). We applied agglomerative hierarchical clustering^{45} to the patterns of average daily household water consumption during the entire duration of the longitudinal study. Here, a water consumption pattern of a household is a vector that contains four values of average daily water consumption, i.e., one for each period of the observational study, including the baseline (see “Methods” section–Baseline and observation periods). The only variable given as input to the hierarchical clustering algorithm consists of household-scale average water consumption per day for each phase of our observational study, which spans the baseline and the three observation periods in 2017, 2018, and 2019. Complete linkage and correlation distance were considered for hierarchical clustering. Complete linkage calculates the distance between two household clusters as the distance between the farthest pair of household water consumption patterns in the two household clusters, i.e., the maximum distance formulated as follows:^{46}

$$dleft( {u,v} right) = {mathrm{max}}left( {{mathrm{dist}}left( {uleft( {x_i} right),vleft( {z_i} right)} right)} right.$$

(2)

where *d*(*u*,*v*) is the distance between clusters *u* and *v*, *x*_{i} are the points belonging to cluster *u* and z_{i} those belonging to cluster *v*. Given two vectors of observations *x*_{i} and *x*_{j}, which in our study correspond to the water consumption patterns of two households (each with *N* elements, with *N* = 4, where each element is the household-scale average water consumption per day for the baseline and three observation periods) and their mean values ((bar x_i) and (bar x_j)) the correlation distance used by hierarchical clustering is calculated as follows:^{47}

$${mathrm{dist}}left( {x_i,x_j} right) = 1 – frac{{(x_i – bar x_i) cdot (x_j – bar x_j)}}{{left| {x_i – bar x_i} right|_2left| {x_j – bar x_j} right|_2}}$$

(3)

We considered hierarchical clustering as an appropriate choice because the analysis of the different hierarchical levels allowed the discovery of heterogeneous water consumption behaviors that would be potentially hidden if algorithms requiring a predefined number of clusters were used. We adopted complete linkage clustering to avoid that individual, mutually close households would force pairs of clusters representing different behaviors to merge. Also, we adopted correlation distance as we wanted to identify similarities in water consumption patterns over time, rather than in water consumption volumes.

After clustering the households in the treatment group with the above hierarchical clustering, similarly to a previous study^{18}, we analyzed the coefficients of a logistic regression classifier cross-validated with binary tests to identify which candidate factors correlate with the main behavior change patterns that characterize the households in the treatment group (Fig. 4 and Supplementary Table 3). In this study, the input candidate factors consist of five independent variables that comprise the availability of smart meter hourly data frequency and the four digital user engagement variables, i.e., login count, non-rewarded action count, rewarded action count, and cumulative reward importance. First, we balanced the distribution of the households in the treatment group across the behavior change segments considered in the binary tests by Synthetic Minority Over-sampling Technique (SMOTE)^{48}. SMOTE oversamples the minority class to balance the sample distribution of a labeled dataset over the different classes. As we consider binary test where only two behavior change segments (or two groups of behavior change segments) are compared, the majority class represents the behavior change segment (or group of behavior change segments) with the highest number of samples and vice versa for the minority class. According to the SMOTE formulation^{48}, starting from a sample *c*_{i,initial}, which in this study is the vector of input candidate factors for a household *i* in the minority class, a new sample *c*_{i,new} is generated on the line between *c*_{i,initial}, and one of its k nearest-neighbors *c*_{j,initial}, with the following formula:

$$c_{i,{mathrm{new}}} = c_{i,{mathrm{initial}}} + lambda (c_{j,{mathrm{initial}}} – c_{i,{mathrm{initial}}})$$

(4)

where *λ* is a random number between 0 and 1, and *k* = 5 nearest neighbors computed based on Euclidean distance are considered by default^{48}. Among the possible options to perform class balancing, here we adopted a “not majority” strategy to over-sample the minority classes, i.e., we resample all classes but the majority class (which, in our binary problem, is equivalent to resampling the minority class).

Second, we trained a logistic regression classifier^{49} with *k*-fold cross-validation (*k* = 5) and evaluated its performance via weighted F1 score. In our binary problem, the logistic regression classifier models the class membership probability P(*y*_{i,p }= 1) for household *i*, where *y*_{i,p }= 1 indicates that the household belongs to behavior change pattern *p* (else *y*_{i,p }= 0, according to the following logistic function:

$$Pleft( {y_{i,p} = 1} right) = frac{1}{{1 + exp^{ – f(c_i)}}}$$

(5)

where *f*(*c*_{i}) is a linear function where the input variables *c*_{i} are weighted by corresponding coefficients *α*:

$$fleft( {c_i} right) = alpha _0 + alpha _1c_{i,1} + alpha _2c_{i,2} + ldots + alpha _Mc_{i,M} + varepsilon _i$$

(6)

In this study, *M* = 5, *c*_{i,1} is a binary variable representing the availability of smart meter with hourly data frequency, *c*_{i,{2,3,4,5}} are the four digital user engagement variables defined above, *α*_{0} is the intercept of the logistic regression, and *ε*_{i} is random noise. We normalized the variables before logistic regression classification by subtracting the mean and dividing by the standard deviation to rescale them to comparable value ranges. The analysis of their corresponding logistic regression coefficients reveals how these variables discriminate among different clusters of water consumers and, thus, how they are potential determinants of defined water consumption behaviors. The F1 score (FS) is first calculated for each behavior change pattern (or group of patterns) *p* as the harmonic mean of the precision and recall achieved by the logistic regression classifier, formulated as follows:

$${mathrm{FS}}_p = 2 times frac{{({mathrm{precision}}_p times {mathrm{recall}}_p)}}{{({mathrm{precision}}_p + {mathrm{recall}}_p)}}$$

(7)

$${mathrm{Precision}}_p = frac{{{mathrm{TP}}_p}}{{{mathrm{TP}}_p + {mathrm{FP}}_p}}$$

(8)

$${mathrm{Recall}}_p = frac{{{mathrm{TP}}_p}}{{{mathrm{TP}}_p + {mathrm{FN}}_p}}$$

(9)

where, given positive and negative classes, TP_{p}, FP_{p}, and FN_{p} are the number of true positive elements (the classifier correctly predicts the positive class for them), false-positive elements (the classifier incorrectly predicts the positive class), and false negative elements (the classifier incorrectly predicts the negative class). A weighted average of the FS_{p} is then computed to account for class imbalance:

$${mathrm{FS}}_{{mathrm{average}}} = frac{1}{H}mathop {sum }limits_{p in P} |p| times {mathrm{FS}}_p$$

(10)

where *P* is the total number of classes *p* and *H* is the total number of elements aggregated across all classes.

### Software implementation

We coded the exclusion criteria in Matlab and used the “prctile” function for the calculation of the quantiles in Tukey’s fences (last Matlab version tested: R2020b)^{50}. We implemented the customer segmentation analysis and logistic regression classifier in Python (version 3.7.1): the customer segmentation analysis relies on the hierarchical clustering included in the SciPy library^{51}; the logistic regression classifier, along with its *k*-fold cross-validation and performance evaluation, were implemented using the machine learning library Scikit-learn^{52}; SMOTE oversampling was implemented using the Imbalanced-learn toolbox^{53}. A notebook with the Python code used to generate the results reported in this article is available in a public GitHub repository^{54}.

Source: Resources - nature.com