The widespread and unjust drinking water and clean water crisis in the United States
Data sourcesData for this analysis were extracted from the American Community Survey (ACS) 5-year estimates for 2014–2018 via Integrated Public Use Microdata Series – National Historic Geographic information System (IPUMS-NHGIS)26, and from the Environmental Protection Agency’s (EPA) Enforcement and Compliance History Online (ECHO) Exporter27. Data were extracted at the county level for all 50 states, Washington DC, and Puerto Rico. The ACS is an ongoing survey of the United States which documents a wide variety of social statistics ranging from simple population counts to housing characteristics. Due to the staggered sampling structure of the ACS, it takes 5 years for every county to be sampled. Because of this, researchers must use 5-year intervals to ensure complete data coverage. The data from these 5 years are projected into estimates for all counties in the United States for the 5-year period in question. As of this study, 2014–2018 was the most recently available data.ECHO collates data from EPA-regulated facilities across the United States of America to report compliance, violation, and penalty information for all facilities for the most recent 5-year interval. ECHO data is updated weekly and the data for this paper was extracted on 18 August 2020. This means that the data in our analysis represents the status of each community water system or Clean Water Act permittee, as reported by the EPA, as of 18 August 2020. Only those community water systems or Clean Water Act permittees listed as Active by ECHO were included in this analysis. As ECHO data is at the level of the water system, permittee, or utility, we aggregated data up to the county level.Safe Drinking Water Act data was geolocated using QGIS 3.10 based upon latitude and longitude. This was done because other geographic identifiers for the Safe Drinking Water Act data were often missing. In line with prior work4,5,7,8, and in order to facilitate a cleaner dataset, we only focus on those water systems labeled community water systems for our analysis. Community water systems were geolocated based upon the county in which their latitude and longitude were located, if a community water system had latitude and longitude over water, a nearest neighbor join was used. In total, 1334 out of 49,479 community water systems were dropped because of there being no reported latitude or longitude. Of these, a total of 4.0%, or 54 community waters systems, were reported as in serious violation.Active Clean Water Act permittees were first identified by listed county. This was done because 345,176 out of 350,476 permittees had a county reported. Those without a county reported were located using latitude and longitude in the same manner as community water systems. There were 10 permittees without latitude and longitude or county listed which were excluded from our analysis. Of these, seven were in significant noncompliance and three were not. Due to some Clean Water Act permittees having latitude and longitude placements far away from the United States, those over 100 km from their nearest county were excluded from analysis. Finally, for community water systems and Clean Water Act permittees, some counties (76 for community water systems and 13 for Clean Water Act permittees) had no reported cases. Those counties were treated as zeroes for cartography and as missing for modeling purposes.Similar to prior work in this area4,5,8, we restrict our analysis to the scale of the county for reasons related to data limitations and resulting conceptual validity. Although counties are arguably larger in geographic area than ideal for an environmental injustice analysis, if we were to use a smaller unit for which data is available such as the census tract, the conceptual validity of the analysis would be limited due to the apolitical nature of these units. As outlined above, ECHO data is messy and missing many geographic identifiers. What is provided is generally either the county or latitude and longitude. If only the county is provided, then we are constrained to using the county regardless of conceptual validity. However, even when latitude and longitude are provided—which is the case for many observations—the provided point location says nothing about which households the water system or permittee serves or impacts. Due to this, whatever geographic unit we use carries the assumption that those in the unit could be plausibly impacted by the water system or permittee. Given that counties are often responsible for both regulating drinking water, as well as maintaining and providing water infrastructure29, we were comfortable with this assumption between point location and presumed spatial impact when using the scale of the county. However, we believe this assumption would have been invalid and untestable for smaller apolitical units for which demographic data is available such as census tracts.Beyond the issues presented by ECHO data, the county is also the appropriate scale of analysis for this study due to the estimate-based nature of the ACS. ACS estimates are based on a rolling 5-year sample structure and often have very large margins of error. At the census tract level, these standard errors can be massive, especially in rural areas30,31,32. Due to this variation, and the need to include all rural areas in this analysis, the county, where the margins of error are considerably smaller, is the appropriate unit for this study. All of this said, the county is, in fact, a larger unit than often desired or used in environmental justice studies. Studies focused on exclusively urban areas with clearer pathways of impact can and should use smaller units such as census tracts. It will be imperative for future scholarship focused on water hardship across the rural-urban continuum to gain access to reliable data on sub-county political units, as well as data linking water systems to users, to continue documenting and pushing for water justice.Dependent variablesThe dependent variables for this analysis were assessed in both a continuous and dichotomous format. For descriptive results and mapping, continuous measures were used. For models of water injustice, a dichotomous measure which classified counties as either having low levels of the specific water issue or elevated levels or the specific water issue, was used due to the low relative frequency of water access and quality issues relative to the whole United States population. For all three outcomes, we benchmark an elevated level of the issue as what would be viewed as an unacceptable level under United Nations Sustainable Development Goal 6.1, which states, “by 2030 achieve universal and equitable access to safe and affordable drinking water for all”1. As this goal focuses on ensuring all people have safe water, we deem a county as having an elevated level of the issue if >1% of households, community water systems, or permittees had incomplete plumbing, were in Significant Violation, or Significant Noncompliance, respectively. Although we could have used an even stricter threshold given the SDG’s emphasis on ensuring access for all people, we use 1% as our cut-off due to its nominal value and ease of interpretation.For water access, the continuous measure was the percent of households in a county with incomplete household plumbing as reported by the ACS. The ACS currently asks respondents if they have access to hot and cold water, a sink with a faucet, and a bath or shower. Up until 2016, the question also included a flush toilet33. As we must use the most recent 2014–2018 5-year estimates to establish full coverage of all counties, this means that incomplete plumbing in this item may, or may not include a flush toilet depending on when the specific county was sampled. The dichotomous version of this variable benchmarked elevated levels of incomplete plumbing as whether or not 1% or more of households in a county had incomplete plumbing.Water quality was assessed via both community water systems from the Safe Drinking Water Act, and from permit data via the Clean Water Act. For Safe Drinking Water Act data, the continuous measure was the percent of community water systems within a county classified as a Safe Drinking Water Act Serious Violator at time of data extraction. The EPA assigns point values of either 1, 5, or 10 based upon the severity of violations of the Safe Drinking Water Act. A Serious Violator is one who has “an aggregate score of at least eleven points as a result of some combination of: unresolved more serious violations (such as maximum contaminant level violations related to acute contaminants), multiple violations (health-based, monitoring and reporting, public notification and/or other violations), and/or continuing violations”27. The dichotomous measure benchmarked elevated rates of Safe Drinking Water Act Significant Violation as whether or not >1% of county community water systems were classified as Serious Violators.For Clean Water Act permit data, the continuous measure was the percent of permit holders listed as in Significant Noncompliance at the time of data extraction. Significant Noncompliance in the Clean Water Act refers to those permit holders who may pose a “more severe level of environmental threat” and is based upon both pollution levels and reporting compliance27. The dichotomous measure again set the threshold for elevated levels of poor water quality at whether or not >1% of Clean Water Act permittees in a county were listed as in Significant Noncompliance at time of data extraction.Independent variablesThe independent variables we include in models of water injustice are those frequently shown to be related to environmental injustice in the United States. These include age, income, poverty, race, ethnicity, education, and rurality17,18,19,20,21,22,23,24,25. Age was included as median age. Income was included as median household income. Poverty was the poverty rate of the county as determined by the official poverty measure of the United States34. Race and ethnicity was included as percent non-Latino/a Black, percent non-Latino/a indigenous, and percent Latino/a. Because the focus was on indigeneity, percent American Indian or Alaska Native was collapsed with Native Hawaiian or Other Pacific Islander. We did not include percent non-Latino/a white due to issues of multicollinearity. Finally, rurality was included as a three-category county indicator of metropolitan, non-metropolitan metropolitan-adjacent, and non-metropolitan remote, as determined by the Office of Management and Budget in 201035. The OMB determines a county is metropolitan if it has a core urban area of 50,000 or more people, or is connected to a core metropolitan county by a 25% or greater share of commuting35. A non-metropolitan county is simply any county not classified as metropolitan. Non-metropolitan metropolitan adjacent counties are those which immediately border a metropolitan county, and non-metropolitan remote counties are those that do not.Water injustice modeling approachWater injustice was assessed by estimating linear probability models for the three dichotomous outcome variables with state fixed effects to control for the visible state level heterogeneity and differences in policy, reporting, and enforcement (e.g. the clear state boundary effects in Fig. 3). We employ cluster-robust standard errors at the state level to account for both heteroskedasticity and state similarities. All modeling was performed in Stata 16.0 and mapping was performed in QGIS 3.10. We assessed all full models for multicollinearity via condition index and VIF values and the independent variables had an acceptable condition index of 5.48, well below the conservative cut-off of 15, as well as VIF values of 20). All indications of statistical significance are at the p More