Overview of hydrologic and ecological mapping protocol
Mapping hydrologic and ecological alteration at the stream reach level followed a 7-step process that builds upon several previously published methods (Fig. 1). The steps include: (1) compiling a nationwide dataset of streamflow gauges from the US Geological Survey (USGS) and distinguishing reference and non-reference gages and associated records21,22,23, (2) assembling stream flow records and calculating hydrologic indices23, (3) quantifying hydrologic alteration for stream gages22, (4) developing models to predict hydrologic alteration from human disturbance variables24, (5) using models to extrapolate hydrologic alteration to ungauged stream reaches24, (6) developing empirical models of fish species richness responses to hydrologic alteration17, and (7) mapping fish richness responses to ungauged stream reaches based on modeled estimates of hydrologic alteration. Methodological details are provided in each of the publications cited above; however, an overview of the steps is provided here. We elaborate more fully on the detailed methodology starting at step 3, as this reflects more of the focus of the technical validation of the dataset (Fig. 1).
Step 1 – Compiling a nationwide streamflow dataset
We assembled streamflow information for 7,088 USGS stream gages with at least 15 years of daily discharge data as of 2010. We only included gages with at least 15 years of complete annual records (i.e., those with <= 30 days of missing daily data). The influence of climate variability on hydrologic statistic values dampens as periods of records increase, and typically, at least 15 years of record are required to stabilize variation in indices, at least to acceptable levels for spatial analyses25. However, a noted limitation of our analysis is that we did not formally control for climate variation in the calculation of hydrologic statistics from stream records varying in length and duration, as compared to Eng et al.26. However, Eng et al.26 found that climate variability had minimal influence on hydrologic alteration at stream gages relative to human land and water management.
Gages were partitioned into reference (n = 2,249) and non-reference (n = 4,839) condition based on geographical evaluations of human disturbance regimes in basins upstream of each gage, reviews of USGS water reports, and visually inspecting flow duration curves21,22. For ease of terminology, we use the term “reference” to indicate the “least disturbance” condition for a region, as defined by Stoddard et al.27. For streams currently regulated by dams and with at least 15-yr records extending prior to dam regulation (n = 250), streamflow records were partitioned into pre- and post-dam construction time periods22. Except for pre-dam construction records, periods of record for reference conditions displayed considerable temporal overlap (at least 50% of overlap in records)23, as did periods of record for non-reference conditions22.
Step 2- Assembling streamflow records and calculating hydrologic statistics
Daily streamflow records were obtained from the USGS National Water Information System (NWIS) website (https://waterdata.usgs.gov/nwis). For reference streams, the entire period of record was used for calculating hydrologic statistics, whereas for non-reference gages, only periods overlapping with contemporary human disturbance regimes were used (1980–2010) due to the temporal limitations of anthropogenic disturbance variables used to predict hydrologic alteration. The National Hydrologic Assessment Tool (www.sciencebase.gov/catalog/item/5387735ee4b0aa26cd7b5461) was used to calculate 110 hydrologic statistics summarizing the magnitude, frequency, duration, timing, and rate of change in flow for all reference and non-reference stream flow records23.
Step 3- Calculating hydrologic alteration indices at gages
Calculating hydrologic alteration at non-reference gages first required estimating reference or natural hydrologic conditions as a baseline from which the degree of alteration could be assessed. Of the 110 hydrologic statistics above, we selected 41 indices that adequately represent the multi-dimensional nature of regional variation in hydrologic across the US23 and have been used in previous assessments of hydrologic alteration22 (Table 1). These 41 indices include the Indicators of Hydrologic Alteration12, a series of non-redundant metrics representing the predominant variation embodied by almost 200 hydrologic variables28. Since the 41 indices are univariate summaries of hydrologic conditions, two additional indices, a hydrologic alteration index and a seasonality alteration index, were calculated to represent multivariate impacts to overall variation among hydrologic metrics and the distribution of monthly flows, respectively (more details provided in Step 3 expanded).
Generally, reference condition values of hydrologic indices were estimated for non-reference gages using random forest statistical models constructed from reference gauges or gauges with pre-dam hydrologic records22. Random forest model performance was high with a median variance explained of 91% among all hydrologic indices and median normalized RMSE of 0.51322 (normalized RMSE by range of values). In cases where statistical models were unreliable (i.e., indices depicting timing of low and high flows), non-reference stream gages were assigned to a hydrologic class representing a range of reference condition index values22. In these situations, the 90th percent confidence interval of hydrologic index values represented by all reference gauges within a hydrologic class was used to represent the reference flow condition. Observed hydrologic indices were then compared to estimated reference conditions to calculate hydrologic alteration indices, characterizing the degree of changes in stream flow due to human influence (see next section).
Step 4 and 5 – Predicting hydrologic alteration and mapping to U.S. streams
Random forest models were constructed to predict each hydrologic alteration index at stream gages using an ensemble of human disturbance variables summarized in the upstream basins contributing to each gage. Predictor variables included landcover, dam storage, infrastructure, and water withdrawals (Supplementary Table 1). Random forest models were developed for the entire US and for each of 29 ecohydrologic regions (Fig. 2), which represent unique combinations of Freshwater Ecoregions29 and two-digit hydrologic unit codes (i.e., major river basins). The same human disturbance variables were compiled in the networks upstream of all NHDplus V1 stream reaches (https://nhdplus.com/NHDPlus/NHDPlusV1_home.php) and models were then applied to predict hydrologic alteration in those reaches. Values were then extended to NHDplus V2 stream reaches (https://www.epa.gov/waterdata/get-nhdplus-national-hydrography-dataset-plus-data) using crosswalk tables.
Step 6 and 7 – Predicting and mapping ecological responses to hydrologic alteration in stream reaches
Comprehensive maps of hydrologic alteration in stream reaches provide a foundation for subsequent modeling efforts, such as evaluating ecological responses to altered streamflow. Once these flow-ecology relationships are developed for multiple regions, ecological conditions, similar to hydrologic conditions, can be extrapolated to stream reaches. Recently, George et al.17 used the same hydrologic alteration indices reported in this study to develop regionally explicit nationwide flow-alteration-ecological-response relationships. These models were used to extrapolate ecological conditions (i.e., losses in fish species richness) to the stream reach resolution based on modeled hydrologic alteration values.
Step 3 expanded: Calculating hydrologic alteration indices for stream gages
We elaborate on the above methodology, starting here with step 3. We used two approaches for calculating hydrologic alteration for non-reference stream gages24. For the majority of indices (Table 1), we calculated hydrologic alteration as proportional changes of observed (O) index values (i.e., human-altered conditions), versus expected (E) index values (i.e., reference conditions) in the equation: (O – E)/E30. Hence, indices ranged from −1 to values» 1. Performance, as measured by area-under-the-curve (AUC), of preliminary models using these raw values were lower than desired (AUC < 0.7). Hence, following Eng et al.30, we scaled values from 0 to 1 to represent a likelihood of hydrologic alteration in the following fashion. For indices < = 1, we used the absolute value of (O – E)/E, whereas indices >1 were assigned maximum values of 1. For reference gages, hydrologic alteration values were set to 0 for each metric.
While the above measures are informative for individual flow components, indices that summarize the multi-dimensional nature of stream flow alterations provide convenient single measures of alteration. We calculated a seasonality index, analogous to Zaerpour et al.7, representing shifts in the monthly flow magnitudes, as cumulative differences in observed (O) and expected (E) values for all mean monthly flows using the following equation:
$$mathop{sum }limits_{m=i}^{12}left(frac{left({O}_{i}-{E}_{i}right)}{{E}_{i}}right)$$
(1)
As a second multidimensional measure, we calculated a cumulative hydrologic alteration index (HAI), which evaluates the degree of separation between the flow regime of non-reference streams’ and that of reference streams within the same hydrologic class. Hydrologic classes represent groups of streams that share similar natural hydrologic patterns. McManamay et al.23 developed a hydrologic classification of reference streams in the US, and subsequently, non-reference gages were assigned to those hydrologic classes using models22. To calculate the HAI, all 110 hydrologic metrics (step 2 above) for reference and non-reference gages were centered, scaled from 0 to 1, and assessed in a principal components analysis (PCA)24. Thirteen of the components were significant according to the broken-stick method31. We partitioned the 13 principal component scores by hydrologic class membership and calculated 90th percentile confidence intervals for only reference streams. The confidence interval (a…b) for significant PC scores (S) is represented by the lower (a) and upper (b) bounds. For each non-reference gage and each significant PC, we calculated a rank (r) value using the following:
$${rm{If}};{a}_{i}le {S}_{i}le {b}_{i};{rm{is}};{rm{true}};{rm{then}};{r}_{i}=0,;{rm{otherwise}};{r}_{i}={V}_{i},$$
(2)
Where Vi is the eigenvalue for the ith significant PC. The HAI was then calculated for each non-reference gage using:
$${sum }_{i=1}^{n}left|{S}_{i}-{a}_{i}right|ast {r}_{i};{rm{for}};{S}_{i} < {a}_{i},{rm{and}};{,sum }_{i=1}^{n}left|{S}_{i}-{b}_{i}right|ast {r}_{i};{rm{for}};{S}_{i} > {b}_{i}$$
(3)
The formula accounts for both the degree of alteration of the PC (i.e., Si-ai or Si-bi) as well as the importance of each PC to overall variability in hydrologic regimes (i.e., eigenvalue, Vi).
To ensure all metrics were on a similar scale, both the seasonality index and HAI were scaled from 0 to 1 for each ecohydrologic region based on:
$$frac{{x}_{i}-{rm{min }}(x)}{max left(xright)-{rm{min }}(x)}$$
(4)
Step 4 and 5 expanded: Hydrologic alteration models and mapping
Random forests32 were constructed to model all hydrologic alteration indices as binomial distributions using 50 predictor variables summarizing natural characteristics and human disturbances, such as landscape alteration and infrastructures (Supplementary Table 1, see24). Random forests are a form of machine learning where large numbers of decision trees are constructed in an iterative fashion using a bootstrapped subsample of observations and subsets of variables32. The remaining observations are termed the out-of-bag (OOB) sample, which is used in cross-validation measurements of variance explained, error, and variable importance. Each tree is constructed from training data and then predictions are combined among all trees. We used the randomForest package33 in the R programming environment to develop tree-based models for the entire US (all gauges) and separately for each ecohydrologic region. Hence, with 43 hydrologic indices and 29 regions, over 1,000 forest models were generated. Predictor variables used in models are classified into 8 groups (number of variables in parentheses): urbanization (14), agriculture (10), dams and reservoirs (6), power generation (6), dischargers and flow modifiers (5), human disturbance indices (3), basin size, stream size, and climate (3), and natural land cover (3) (Supplementary Table 1). Predictor variables were obtained from multiple sources or our own geospatial analysis and were summarized for both the local catchment surrounding each stream reach containing the stream gauge or were accumulated for the entire catchment contributing to each gauge (Supplementary Table 1). Similarly, the same predictor variables were compiled for all 2.6 million NHDPlus V1 stream reaches, both for local catchments and entire stream networks upstream of each reach. Following construction and calibration, random forest models were used to extrapolate hydrologic alteration values to all 2.6 million NHDPlus V1 stream reaches in the CONUS (Fig. 2). Using crosswalks between NHDPlus V1 and V2, we extended hydrologic alteration values to NHD Plus V2 stream reaches.
Step 6 and 7 expanded: Ecological alteration models and mapping
To develop flow-alteration-ecological response relationships, George et al.17 developed a comprehensive dataset of overlapping hydrologic and ecological data for 6,452 stream reach locations. At each location, measures of hydrologic alteration and ecological alteration were compiled, where ecological alteration was measured as the deviation in observed native fish richness from expected natural conditions17. Flow-alteration-ecological-response relationships typically adopt a “wedge-shaped” distribution well-suited for quantile regression34. Hence, George et al. generated quantile regression models predicting 50th, 75th, and 95th percentile alterations in fish richness from hydrologic alteration values for all hydrologic metrics, except HAI, within each 4-digit hydrologic unit code (HUC-4), except watersheds where limited sample size prohibited model development (12% of watersheds).
Quantile regression model coefficients for each hydrologic metric within each HUC4 were used to predict alterations in native fish richness at the stream reach resolution based on modeled estimates of hydrologic alteration. In situations where coefficients were unavailable for HUC-4s, average model coefficients for entire ecohydrologic regions were used. Residuals in fish richness were calculated for each hydrologic alteration metric (e.g., Fig. 3a). Flow thresholds or tipping points represent hydrologic alteration values beyond which ecological degradation is expected13 or can be deemed socially unacceptable15. Presuming that loss of any native fish species is unacceptable, the hydrologic alteration value at which residuals in fish richness <0 is considered the threshold or limit. Based on the quantile regression models17, thresholds were identified and applied to all stream reaches based on HUC-4 or ecohydrologic region. Modeled hydrologic alteration values for each metric were compared to each respective threshold in each stream reach to yield a binary response where losses in fish biodiversity are expected (1) or not (0), depending on if the hydrologic threshold was exceeded. The mean value among these responses for all hydrologic alteration metrics yields a probability of fish biodiversity loss (Fig. 3b), ranging from 0 to 1, based on all components of the flow regime.
Source: Ecology - nature.com