# Global potential for harvesting drinking water from air using solar energy

### Water access data processing

Data on drinking water coverage by region was acquired from the WHO/UNICEF JMP. The JMP acts as official custodian of global data on water supply, sanitation and hygiene2 and assimilates data from administrative data, national census and surveys for individual countries, and maintains a database that can be accessed online through their website. We accessed data tables for national and subnational drinking water service levels from https://washdata.org.

JMP datasets are not geographically linked to official boundary files. We joined the tables to GIS boundaries obtained from the following open-source collections: GADM (https://gadm.org), the Spatial Data Repository of the Demographic and Health Surveys Program of USAID (DHS) and the Global Data Lab of Radboud University (GDL)2,50,51,52,53. Subnational regions reported by the JMP are unstructured, representing various regional administrative levels (province, state, district and others).

The JMP national and subnational data were joined to GIS boundaries using a custom geoprocessing tool built in Python and ArcGIS 10. The tool joins the available JMP subnational-level survey data to the closest name match of regional boundary names from a merged stack of GADM (admin1, admin2 and admin3), DHS and GDL boundaries worldwide. The JMP national-level survey data is then joined to GADM national (admin0) boundaries for countries which have no subnational data available. Finally, the two boundary-joined datasets (national and subnational) are merged, processed and exported as a seamless global fabric of water-stressed-population data at the highest respective spatial resolutions available (Fig. 1a).

JMP does not report the breakdown between the SMDW and basic service level within subnational regions, and instead reports a combined category called ‘at least basic’ (ALB). To estimate the SMDW values in subnational regions, a simple cross-multiplication was performed using the splits at the national level:

$${{rm{SMDW}}}_{{rm{subnational}}}=frac{{{rm{SMDW}}}_{{rm{national}}}}{{{rm{ALB}}}_{{rm{national}}}}{times {rm{ALB}}}_{{rm{subnational}}},$$

where ALBnational, ALBsubnational and SMDWnational are known values.

Validation of the cross-estimation of share of SMDW from ALB for subnational regions was conducted on a reference dataset of nationally representative household surveys that collected data on all criteria for SMDW54, shown in Extended Data Fig. 2. We report regression results of R2 = 0.87 and a standard error of 3.67, indicating a bias which over-reports SMDW share and a probable underestimate of people living without SMDW in our study. This discrepancy comes from JMP calculations of SMDW that rely on the minimum value of multiple drinking water service criteria (free from contamination, available when needed and accessible on premise) rather than considering whether individual households meet all criteria for SMDW55.

The fraction of population without SMDW was multiplied by residential population values in the WorldPop top-down unconstrained global mosaic population count of 2017 at 1 km spatial resolution56 (https://www.worldpop.org). WorldPop was accessed online as a TIF image and imported to Google Earth Engine. The year 2017 was chosen to more closely match water access data from JMP. The percentages reported by JMP are probably not uniform within most regions57, introducing an unknown error to Fig. 1b, but represent the best estimate available to us given the limitations of these regionally reported data.

### Climate input and conversion approximations

#### GHI and reference plane

We used GHI (in W m−2) as solar energy input data. GHI has good availability in climate datasets and introduces the fewest number of assumptions. Since GHI describes the irradiance in a locally horizontal reference plane, this approximation is only exact for devices having a horizontally oriented solar harvesting area. Annually averaged comparisons between horizontal and optimal fixed-tilt panels show negligible differences in direct plus diffuse radiation in tropical latitudes, and ratios below 25% in locations within 50° north and south latitudes58. Those seeking precise absolute predictions for tilted devices or higher latitudes are encouraged to adapt the provided code to their specific assumptions.

#### Conversion from SY to AWH output

As discussed in the main text, solar-driven AWH devices typically have one of two predominant energy inputs: thermal (converted directly from incident sunlight on the device) or electrical (from PV). Here, the energy units used to calculate yield in l kWh−1 are incident solar energy directly from GHI. The various assumptions are made in relation to the reported values based on their source. The thermal limits33, target curves, and experimental results reported by TRP15 and MOFs were assumed to have direct (100%) conversion from sunlight to heat. For the ZMW device, the table provided by the manufacturer accounts for system losses, so the table values were directly converted in our model35. For ref. 34 and the cooler–condenser limits from ref. 32, which both assume work input instead of heat, we applied a typical PV conversion efficiency of 20% to convert from sunlight kWh (GHI) to kWhPV (electrical work) input to the device59.

#### Sufficiently short sorbent cycling times

AWH-Geo assumes continuous or quasi-continuous AWH. AWH-Geo considers each 1-h timestep independently and is thus stateless. Aside from edge cases, this is a safe assumption for mass efficient SC-AWH devices, which typically have time constants shorter than 1 h, both for sorbent cycling and for most of the thermal time constants. For devices with longer time constants, batch devices or processes with slow (de)sorption kinetics, this assumption may introduce increased error, and may require further adaptation of the provided code.

### Climate time-series calculation

AWH-Geo is a resource-assessment tool for AWH. It consists of a geospatial processing pipeline for mapping water production (in litres per unit time) around the world of any solar-driven continuous AWH device that can be characterized by an output table of the form output = f(RH, T, GHI).

Output tables show AWH output values in l h−1 or l h−1 m−2 across permutations of the 3 main climate variables in the following ranges: RH between 0 and 100 % in intervals of 10%, GHI between 0 and 1,300 W m−2 in intervals of 100 W m−2, and T between 0 and 45 °C in intervals of 2.5 °C (2,145 total output values). The tables are converted into a 3D array image in Google Earth Engine and processed across the climate time-series image collection for the period of interest. Finally, these AWH output values are composited (reduced) to a single time-averaged statistic of interest as an image.

Climate time-series data was acquired from the ERA5-Land climate reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF)60, accessed from the Google Earth Engine data catalogue. ERA5-Land surface variables were used in 1-h intervals and 0.1°× 0.1° (nominal 9 km). The 10-year analysis period (2010–2019, inclusive) was used for this work, and represents a period long enough to provide a reasonable correction for medium-term interannual climatic variability.

Climate variables GHI and T were matched to ERA5-Land parameters ‘Surface solar radiation downwards’ (converted from cumulative to mean hourly) and ‘2 metre temperature’ (converted from K to °C), respectively. RH was calculated from the ambient and dew point temperature parameters in a relationship derived from the August–Roche–Magnus approximation61 rearranged as:

$${rm{RH}}=100 % times frac{{{rm{e}}}^{left(frac{a{T}_{{rm{d}}}}{b+{T}_{{rm{d}}}}right)}}{{{rm{e}}}^{left(frac{{aT}}{b+T}right)}}$$

where a is 17.625 (constant), b is 243.04 (constant), T is the ERA5-Land parameter ‘2 metre temperature’ converted from K to °C, and Td is the ERA5-Land parameter ‘2 metre dewpoint temperature’ converted from K to °C.

Spot validation of the climate parameters and the mapped output was performed manually in Google Earth Engine across several timesteps in 2016 in Ames, Iowa (using the Iowa Environmental Mesonet AMES-8-WSW station62) and showed insignificant error (< 5%).

### Mapping upper bounds

Figure 3a maps thermodynamic upper bound outputs for SC-AWH based on an equation from Kim et al. 33, reproduced below.

$$frac{{dot{Q}}_{{rm{hot}},{rm{in}},{rm{min }}}}{{dot{m}}_{{rm{water}},{rm{out}}}}=left[frac{1}{{omega }_{{rm{air}},{rm{in}}}-{omega }_{{rm{air}},{rm{out}}}}({e}_{{rm{air}},{rm{out}}}-{e}_{{rm{air}},{rm{in}}})+{e}_{{rm{water}},{rm{out}}}right]times {left(1-frac{{T}_{{rm{ambient}}}}{{T}_{{rm{hot}}}}right)}^{-1}$$

where ({dot{Q}}_{{rm{hot}},{rm{in}},{rm{min }}}) is the minimum input heat flux (in Wheat) required to drive the process, ({T}_{{rm{hot}}}) is the temperature (in K) at which the input heat is delivered, ({T}_{{rm{ambient}}}) is the ambient temperature (in K) at which heat is rejected and water and air exit the process, ({dot{m}}_{{rm{water}},{rm{out}}}) is the production rate of liquid water by mass, (omega )denotes humidity ratios in kg of water per kg of dry air, ({e}) denotes specific exergies, which can be looked up for given temperatures and humidities, subscript air,in denotes ambient air drawn in at ({T}_{{rm{ambient}}}) from which to extract moisture, subscript air,out denotes air exiting the process at ({T}_{{rm{ambient}}}) after extracting some moisture from it, subscript water,out denotes liquid water exiting the process at ({T}_{{rm{ambient}}}) as the desired product.

Parameters not present in this formula, but that are in Kim’s underlying derivation: this upper limit is obtained for a small recovery ratio (RR ~ 0) chosen for numerical stability and for reversible process conditions (entropy generation, Sgen = 0).

Kim’s model assumes an AWH in which the fundamental energies required are driven by input heat supplied at a temperature ({T}_{{rm{hot}}}). The limit it represents applies independent of the process, number of stages, sorbent choice, and so on, as long as heat drives the process.

We adapt Kim’s model to solar energy input, assuming an idealized conversion efficiency from solar irradiance to usable heat of 100%. This idealization retains a robust upper bound without bringing in additional parameters. Literature values for theoretical sun-to-heat efficiency limits range from >99.99 to 95.80% for thermal absorbers, depending on the level of angular selectivity63.

Rearranged, Kim’s model yields

$$frac{{dot{V}}_{{rm{water}},{rm{out}}}}{A}le {E}_{{rm{GHI}}}times left(1-frac{{T}_{{rm{ambient}}}}{{T}_{{rm{hot}}}}right)times {left[frac{1}{{omega }_{{rm{air}},{rm{in}}}-{omega }_{{rm{air}},{rm{out}}}}({e}_{{rm{air}},{rm{out}}}-{e}_{{rm{air}},{rm{in}}})+{e}_{{rm{water}},{rm{out}}}right]}^{-1}times frac{1}{{rho }_{{rm{water}}}}$$

where, in addition, ({dot{V}}_{{rm{water}},{rm{out}}}) is the production rate of liquid water by volume, ({A}) is the area harvesting sunlight (see approximation section below), ({E}_{{rm{GHI}}}) is GHI in Wsun m−2, and ({rho }_{{rm{water}}}) is the density of water.

This is now a function of the three key climate variables: GHI (in the first term), ambient temperature (in the second and hidden in the third term) and RH (entering the third term). This was converted to an output table and processed through the AWH-Geo pipeline and presented in Fig. 3a. While this can be run for any choice of parameter ({T}_{{rm{hot}}}), we present figures here for ({T}_{{rm{hot}}}) = 100 °C, a temperature still achievable in low-cost (non-vacuum) practical devices without tracking or sunlight concentration. Higher driving temperatures increase the upper bound for water output. For the limits analysis, values of RH above 90% are clamped to prevent unrealistically high theoretical outputs as Kim’s equation goes to infinity at 100% RH. A further assumption is made that new ambient air is efficiently refreshed.

Figure 3b maps the maximum yield for active cooler–condensers without recuperation of sensible heat—all given work input and an optimum coefficient of performance of the cooling unit at a condenser temperature that maximizes specific yield as modelled by Peeters32, which we digitized from their fig. 11. Peeters chose to set yield to zero whenever frost formation would be expected on the condenser. Since Peeters assumes work input, we convert from solar energy (GHI) to kWhPV as discussed above.

Figure 3c maps Zhao’s experimental results from a TRP using a logistic regression curve fit to their reported SYs of 0.21, 3.71 and 9.28 l kWh−1 at 30, 60 and 90% RH, respectively15. The terms of the curve fit are reported in the next section.

Custom yellow to blue map colours are based on www.ColorBrewer.org, by C. A. Brewer, Penn State64.

### Specific yield and target curves

Two simple characteristic equations, linear and logistic, were used to fit a limited set of SY and RH pairs from laboratory experiments or reported values and plotted through AWH-Geo using calculated output tables. Hypothetical curves of similar form whose terms were adjusted iteratively in AWH-Geo to goal-seek a target output (5 l d−1) and user base, and are reported here (for 1-m2 devices). In the following equations, RH in % is taken as a fraction (for example 55% is equivalent to 0.55).

The linear target curve is a simple linear function which crosses the y-axis at zero:

$${rm{SY}}({rm{RH}})=atimes {rm{RH}}$$

where a is set to 1.60, 1.86 and 2.60 L/kWh to reach targets of 0.5, 1.0, and 2.0 billion people without SMDW, respectively, and RH is input RH (fractional).

The logistic target curve is a logistic function:

$${rm{SY}}({rm{RH}})=frac{L}{1+{{rm{e}}}^{-k({rm{RH}}-{{rm{RH}}}_{0})}}$$

where L is set to 1.80, 2.40 and 4.80 L kWh−1 to reach targets of 0.5, 1.0 and 2.0 billion people without SMDW, respectively, k is the growth rate set to 10.0, and ({rm{RH}}) and ({{rm{RH}}}_{0}) are input RH (fractional), and 0.60, respectively.

The SY values reported by Zhao for TRPs (which they term ‘SMAG’) were fit to a logistic function of the same form with the following parameters: L set to 9.81 L kWh−1, k set to 11.25 and RH0 set to 0.645.

The resulting fitted SY profile is expanded into an output table. As with all reports providing SY values instead of full output tables, this forces an assumption of linearity in heat rate (approximately equal to GHI), which may introduce error at lower GHI levels. Zhao reports SY of the TRP material is consistent across temperature below 40 °C—the material’s lower critical solution temperature—above which its performance drops precipitously. Accordingly, we set the SY to 0 l kWh−1 for temperatures ≥40 °C in the output table.

Bagheri reported performance of three existing AWH devices across several climate conditions using an ‘energy consumption rate’ in kWh/L, which can be considered to be the SEC, and the simple reciprocal of SY. Instead of fitting a logistic curve to the reciprocals, we fit an exponential function to the average SEC of the three devices in conditions above 20 °C of the equation:

$${rm{SEC}}({rm{RH}})=9.03{{rm{e}}}^{-2.99{rm{RH}}}$$

where SEC is specific energy consumption in kWhPV l−1 and RH is fractional.

This was applied to RH and taken as reciprocal in an output table and run through AWH-Geo. Since Bagheri reports the equivalent of kWhPV, we scale to adapt to GHI input with a photovoltaic conversion efficiency as discussed above.

For performance of the ZMW device (the company’s ~3 m2 SOURCE Hydropanel), we used values from the panel production contour plot in the technical specification sheet available from the manufacturer’s website35. The decision for inclusion was made owing to the importance as an early example of a SC-AWH product with commercial intent. Values in l per panel per day were taken at each 10% RH step at 5 kWh m−2, assumed to represent kWh m−2 d−1, and divided by 15 kWh (~3 m2 × 5 kWh m−2) to convert to SY in l kWh−1. From the resulting SY curve, an output table was generated and processed with AWH-Geo.

### Coincidence analysis and population sums

The coincidence analysis was run through AWH-Geo across 70 threshold pairs given the full permutation set of RH from 10 to 100% and GHI from 400 to 700 W m−2 threshold intervals, using binary image time series. The resulting mean multiplied by 24 represents average hours per day thresholds are met simultaneously, giving ophd. Below is a functional representation of this time-series calculation:

$${langle ({{rm{RH}}}_{t,{rm{px}}} > {{rm{RH}}}_{{rm{threshold}}}){{rm{& & }}}_{{rm{simultaneous}}}({{rm{GHI}}}_{t,{rm{px}}} > {{rm{GHI}}}_{{rm{threshold}}})rangle }_{{rm{time; average}}}$$

where ({{rm{RH}}}_{t,{rm{px}}}) is the RH in the map pixel ({rm{px}}) at time (t), ({{rm{RH}}}_{{rm{threshold}}}) is the threshold of RH above which the device is assumed to operate, ({{rm{GHI}}}_{t,{rm{px}}}) is the GHI in the map pixel ({rm{px}}) at time (t), and ({{rm{GHI}}}_{{rm{threshold}}}) is the threshold of GHI above which the device is assumed to operate.

The population calculation was then conducted on these images in Google Earth Engine.

Zonal statistics were performed on the mean ophd images as integers (0–24) using a grouped image reduction (at 1,000-m scale) summing the population integer counts on the population without SMDW distribution image created previously (derived from WorldPop). This reduction was performed at 1,000 m. Validation was performed in Google Earth Engine on single countries within single ophd zones and showed insignificant error (<2%). The population results were collected as a table (feature collection) and population was summed cumulatively within stacked ophd zones. These were exported to R for plotting in Fig. 4b.

To assess the sensitivity of results to the choice of climate and population dataset, we performed a coincidence analysis (Fig. 4b) with alternative datasets and provide those results in Extended Data Fig. 1.

As an alternative climate dataset to ERA-5 (1 h, 9 km), we used NASA’s Global Land Data Assimilation System (GLDAS) 2.1 at 0.25° × 0.25° spatial resolution (nominally 30 km) and 3 h temporal resolution65 during the period concurrent with the main results, 2010–2019. As an alternative population dataset to WorldPop 2017, we used Oak Ridge National Laboratory’s LandScan 2017 ambient population counts at 1 km spatial resolution66. Two results comparisons were calculated: (1) GLDAS calculated with WorldPop 2017 for direct comparison of climate data input, and (2) GLDAS calculated with LandScan for comparison of climate and population dataset substitution.

The intercomparisons suggest there is negligible sensitivity to the population dataset used, but substantial and systematic sensitivity to the climate dataset used, while all intercomparisons agree in main features and qualitative conclusions. The spatially and temporally (3×) coarser GLDAS dataset consistently results in lower predictions of water output and impact than the finer ERA-5 climate reanalysis. We speculate that the 3-h timesteps of GLDAS are insufficient to capture the performance-critical humidity and GHI dynamics throughout the day (probably morning and evening hours), and, similarly, the 30-km pixels are insufficient to resolve fine-scale climate patterns driven by topographic and other microscale physiographic effects. This illustrates the importance of using high-resolution climate datasets.

### Variability statistics of AWH output

To go beyond annual averages and study availability, we introduce a set of metrics we named moving average density 90th percentile (MADP90).

The MADP90-t represents a device’s average output rate (l d−1 m−2) that will be exceeded for 90% of periods lasting t days at the given location. MADP90 is calculated from the derived P90 value across a probability density function (PDF) of daily mean output during each t-day window in the time series (2010−2019). The result is a scalar that can be mapped spatially. Moving-window periods of 1, 7, 30, 60, 90 and 180 days were examined in this study. MADP90-results are available as additional results and map layers in AWH-Geo.

Extended Data Fig. 3 provides an example set of PDFs for a location in southwest Tanzania. Each of the P90 values correspond to a version of the MADP90 metric corresponding to a moving window period. The P90 value naturally increases with t in most geographic locations as the PDF tightens its dispersion about the natural (P50) mean.

Source: Resources - nature.com