UPRLIMET: UPstream Regional LiDAR Model for Extent of Trout in stream networks

UPRLIMET is our response to a need for a consistent method for predicting the upper extent of trout in all streams across land ownerships within our region. By developing and implementing the model using LiDAR-derived flowline hydrography, we offer a standardized, spatially explicit, spatially contiguous (where LiDAR hydrography is available), and high-quality fish-distribution layer based on the probability of fish presence. UPRLIMET maps both the probability of trout and the upper limit of trout across landscapes, ownerships, and jurisdictions, and better captures the upper extent of fish in headwater reaches relative to previous approaches allowing for a cross-boundary distribution map on which decision-makers and managers can base policies and regulations.

This work provides a transferable prediction modeling framework for systematically and comprehensively estimating the upper distribution limit of fish, which could be calibrated and implemented in watersheds and for fish species around the globe. Although the dependency on LiDAR-derived data here may be seen as a limitation to broader implementation of this method, the method is scalable to any resolution, and LiDAR is becoming increasingly ubiquitous in the United States through the U.S. Geological Survey 3D Elevation Program, which is funding LiDAR acquisitions across the United States. Furthermore, LiDAR data is available globally via data from GEDI and ICESAT-2 satellites that offer coarser resolution (~ 25-m) data that are still superior to either ASTER or SRTM derived-DEMs^{26, 27}.

Minimizing prediction errors for the upper limit of trout is important to decision support and management planning because it ensures that forest-harvest regulations and management prescriptions are aligned. It is important to note that the prediction error estimates from this study are derived from the NSpCV process, except for models using 20% slope thresholds or unaltered parameterization of Fransen’s model¹³, because it is likely that the NSpCV estimates are conservative. They tended to overestimate error, as evidenced by the fact that the Refit model (i.e. Fransen’s optimal model¹³ refit to our data) exhibited a larger MAE than the unchanged optimal Fransen model¹³. This unexpected result was likely due to applying the NSpCV routine on the Refit model, resulting in the use of many intermediate models to characterize predictive performance using randomized subsets of independent training and test data. In contrast, the optimal Fransen model¹³ was developed independently using the data in this study and thus error could be evaluated directly without subsampling imposed by NSpCV.

The relatively low error for the two-stage model that becomes UPRLIMET suggests that it more accurately characterizes the upper limit of fish than all other models considered in this study, including the Fransen model¹³, which has been used for estimating upper limit of fish regionally. Although some of the models exhibited relatively small differences in error relative to the model that became UPRLIMET, small differences in predicted upper limit locations when considered in aggregate across multiple watersheds can potentially alter management decisions and expected outcomes. Differences in predictive performance and error between UPRLIMET and the optimal Fransen model¹³ are likely attributed to high-accuracy hydrography and hydro-topographic data (as LiDAR-derived DEMs were not available in western Washington in 2006), which allowed a finer-scale of analysis (i.e., 5-m vs 10-m reaches). Additionally, the fact that UPRLIMET was fit to data solely from western Oregon likely offers predictive performance gains when applied to western Oregon when compared to the Fransen model¹³ that was fit to data from western Washington.

Quantifying the predicted accuracy associated with applying UPRLIMET to western Washington will require new data and is outside the intended scope of this study. However, we think it is reasonable to infer findings from UPRLIMET across regions with similar climatic and hydro-topographic conditions including northwestern California, western Oregon, western Washington, and southwestern British Columbia, especially given the broad availability of LiDAR-derived DEMs. This conclusion is supported the fact that both the Fransen¹³ and Refit models produced similar logistic regression coefficients (Data S5) and similar Matthews Correlation Coefficients (Data S6), suggesting that feature space of the two models is similar. This evidence is further corroborated by the high degree of overlap observed among the distributions of each of the four predictor variables for both western Oregon and Washington. We acknowledge that UPRLIMET does not contain identical predictor variables to Fransen’s model¹³ but maintain that they are similar enough in purpose that it is reasonable to assume that the feature space similarities are retained.

When we undertook this study, we hypothesized that a prediction model based on RF would offer superior predictive performance over those based on LR, given the availability of 67 predictor variables and RF’s demonstrated superior predictive performance in ecological applications^23,24,25. However, our results suggests no improvement is offered by including more than four of the 67 environmental predictors examined, and that no clear advantage is offered by employing the more complex RF model, as evidenced by the top three of the top five prediction models being four-variable LR model algorithms (Fig. 3; Data S3.) The general importance of these variables to so many models is likely due to the strong linear relationships in the response of fish or no fish in logit space given the slopes of the curves in the partial dependence profiles (Fig. 4). This finding is congruent with the fundamental premise of LR, which is to explain and predict a response with a functional relationship, whereas RF deliberately focuses only on maximizing prediction accuracy with many decision trees²⁸. Additional advantages to prediction models based on LR include the following: relatively better extrapolation performance over RF²⁹, the simplicity of transferring a LR model to another processing platform using the model coefficients (versus the black box of RF decisions), and the immensely reduced computational processing times associated with LR model fitting and prediction. These advantages are especially key to this work, where there may be a desire to implement the model on other landscapes without the requisite expertise in doing so using the R software³⁰. However, there are tradeoffs, as LR is more sensitive to the influence of outliers and multi-collinearity among variables, and overfitting is an increasing concern as the number of predictor variables increase, whereas RF tends to be robust to these concerns, but is more likely to produce a high-variance, low-bias prediction model³¹.

Although there is no single, general explanation for distribution limits of species³², the intersection of stream size, slope, and elevation together locate the upper limit of fish. Stream size corresponds to major ecosystem changes along a stream continuum including for energy sources, ecosystem metabolism, habitat characteristics, and biodiversity³³, as well as the upper distribution limit of fish, as shown here. As expected, stream size accounts for the top two variables in the model suggesting that it is the major driver of the upper distribution limit of fish with the probability of trout increasing with increasing upstream stream length and upstream drainage area. Our finding proposes that downstream stream reaches are more likely to have fish. Although the underlying mechanisms have multiple influences, factors related to increasing stream size, such as increasing habitat size, habitat complexity, stability, or temperature variability³⁴ have been shown to be important. Similarly, stream size is the most sensitive factor in intrinsic potential models for Chinook Salmon (O. tshawytscha³⁵). Slope, the next variable of importance influencing the upper extent of fish, exerts control on physical habitats in streams, including channel morphology, hydraulics, sediment transport, substrate, and habitat³⁶. Steep slopes drastically prevent trout from reaching areas above waterfalls or impassable chutes of over 25% slope, but trout can be found in streams channels without barriers at slopes as high as 28%^{7, 14, 37}. Other fishes, such as Coho Salmon (O. kisutch) and steelhead (O. mykiss) are generally not found above 12% slope³⁸. Interestingly, survival of fishes that make it upstream or are introduced above barriers may be facilitated by a geomorphic setting that is less prone to debris flows and other episodic sediment fluxes and has a greater resilience to flooding resulting from wider valley and greater floodplain connectivity³⁹. Elevation or vertical topographic position may indirectly integrate broad influences of other landscape-scale or climate factors or also indirectly capture stream size, influencing the likelihood of fish presence. Frequently, species richness increases at lower elevations⁴⁰, and we suggest that elevation also contributes to species distribution limits, as is the case for the Endangered Species Act listed Bull Trout (Salvelinus confluentus)⁴¹. The multiple factors associated with elevation correspond to the relationship found for stream size that smaller streams are less likely to have fish. Ultimately, the intersection of stream size, slope, and elevation guide us to finding the upper extent of fish in streams.

Physical influences have been proposed to be more limiting to fish distributions upstream, such as near the upper extent of fish, whereas biological factors are probably more important downstream³³. Although 67 environmental predictor variables representing geologic, soil, climatic, and hydro-topographic conditions at local and patch scales are evaluated (Data S1), only the hydro-topographic variables of stream size, slope, and elevation are important to predicting the upper limit of fish in UPRLIMET. In fact, the top 9 models (Fig. 3; Data S3) relied on just four to five hydro-topographic variables, most of which were patch-scale variables or elevation at 1000 m, all of which incorporate a broader extent of influence. This suggests that local scale variables that contribute to fish limits, including slope or riparian influences may need to be further explored. In addition, some of the remaining 63 variables present in UPRLIMET, such as precipitation and air temperature, are important drivers of within-network trout distributions and contribute to their connectivity. Some of these predictor variables appear in the 10th ranked 26-variable RF-O-SR1 model (Data S2; Data S4; Data S8), but the influence appears to be dubious for isolating the upper limit and explaining variation in fish occurrence because MAE of upper limit was substantially higher than the 9 models with lower MAEs (Fig. 3; Data S3), and the lower MCC of the associated RF-O sub-model (Data S6). It is likely that other combinations of the 67 predictor variables, including precipitation, may be more important when this model development and evaluation framework is applied elsewhere, especially if those areas contain fishes or are places that are vulnerable to changing water temperatures and streamflow regimes. In addition, biological factors may be a concern in other watersheds, including invasive species and fish stocking which can limit the longitudinal distribution and the upstream extent of fishes.

Given the large geographic extent of this study, we expected other variables such as precipitation to be more important drivers, however due to a combination of a wet water year, a lack of precipitation gradient in the study area, coarse grain data, and location of fish in streams this was not the case. For example, 2017 was a wetter than normal water year⁵³, and it may be that the gradient of precipitation variation in western Oregon was not strong enough to explain the variation in the spatial distribution of trout occurrence. All climate data, including the precipitation data were sourced from relatively coarse-scale (800 m) PRISM data. The inability to adequately downscale precipitation to characterize how precipitation truly varies within and between patches, especially along elevational gradients, likely confounded how the model interprets the influence of precipitation. Trout occurrence was on perennial streams, which is likely far enough downstream of locations where variation in precipitation was the dominant influence on streamflow permanence and consequently would not have been a factor.

Stream network structure plays a key role in the upper limits of fish. Upper limits for fish can occur at either lateral or terminal points¹³ and when mapping these points, differences were seen for UPRLIMET relative to other datasets. Lateral limits end in the tributary stream just above where it connects with a mainstem stream. Terminal limits include both mid-stream terminal limits where fish drop out in the middle of a stream channel owing to a soft (i.e., transient barrier or puttering out) or hard (i.e., waterfall) edge, and confluence terminal limits where the upper limit of fish ends at the confluence. For example, when closely examining the 14 watersheds where we have overlapping information across various datasets and models, UPRLIMET and the Fransen optimal model¹³ exhibit substantial agreement in their lateral limits. However, the largest differences are in their terminal ends, especially terminal mid-stream limits, probably owing to hydro-topographic changes that contribute to fish occurrence at confluences, which are more pronounced than mid-stream. Accordingly, the logic in the stopping rule is likely important in identifying specific upper extent of fish distributions in reaches that end mid-stream.

Differences among databases for the upper distribution limits of fish come from both the upper limit points and depiction of fish-bearing reaches, underscoring the importance of having a shared map with common coverage of the fish extent across landscapes and ownerships. Differences among mapped distributions can result from source information, relating to whether it is modeled or occurrence data. Models, such as UPRLIMET, can be applied across a broad extent based on model parameters and training data, thereby offering broad coverage for distributions (and quantifiable error) across the landscape, ownerships, and jurisdictions. However, models are limited by accuracy and fit. As such, they can incorrectly predict distributions in some areas, especially if there are prediction features not yet trained with the model data where prediction would require extrapolation of the model. This makes both the training dataset and modeled extent important considerations, as models are only as good as the data used to develop them. Updating UPRLIMET with new data as it becomes available will help to expand the prediction domain, improve accuracy, and allow the model to do more interpolation than extrapolation.

Distributions based on occurrence information depend heavily on data availability, data quality, and access. Differences in data availability can lead to inconsistent coverage across landscapes and ownerships, with high coverage in some watersheds and low to no coverage in others. Inconsistent coverage can lead to errors that are difficult to quantify across landscapes, ownerships, and survey crews. Occurrence information also depends on the ability to survey watersheds and gain access across ownership types, including on private lands that do not have the same assurances of access as public lands, resulting in information asymmetry^{42, 43}. Data quality also depends on the spatial accuracy of the points of uppermost fish, which are a function of GPS quality and error, and can drastically change the modeled results, as these points are used in the training dataset. Differences among mapped distribution limits also result from differences in field protocols on designating last fish. For example, some crews note fish distribution limits where they visually see the last fish, whereas others note it upstream of where they saw last fish, based on habitat features that would limit fish. With the advent of LiDAR-derived DEMs and associated LiDAR-derived stream hydrography, like those available in much of western Oregon, have revealed additional flowlines in watersheds compared to previous topographic maps, which adds more potential tributaries to survey for fish-distribution assessments. When these new previously unmapped tributaries are paired with a model, such as UPRLIMET, a common information set is available across landowners, managers, and agencies for the upper extent of fish. This helps policymakers determine where to apply regulations that support fisheries and forest management, based on the upper fish limit.

Next steps for applying and expanding the model include addressing current data gaps. More information and observations about the upper distribution limits of fish beyond western Oregon would be needed to properly expand the spatial scope of the model. The upper extent of fish is at the detection limit of many current technologies, including global nativation satellite system (GNSS), geographic information systems (GIS), and LiDAR, especially in forested landscapes. Better precision of GNSS coordinates from observations would help greatly. From an ecological perspective, we could focus on fish distribution limits that vary seasonally or interannually to better understand which stream features and hydrologic parameters influence those endpoints. We also need information related to locations of barriers, including culverts, waterfalls, and knickpoints to understand their influence on contemporary distributions. Incorporating variables representing riparian conditions as well as leveraging higher-resolution DEMs (< 1 m) to better capture fine-scale geomorphic conditions such as pools and small barriers, especially in the headwater reaches, has the potential to further enhance the ability to resolve upper fish limits.

There are also opportunities to refine the underlying modeling methodology. Deep learning methods applied to structured data (e.g., data tables like those used in this study) are showing significant improvement over RF and gradient boosting methods⁴⁴, which may result in improved upper limit estimations because of the potentially improved prediction of trout distributions. Given that observation data is typically collected in advance of localized management operations, there may be advantages to implementing a Bayesian Updating approach that could readily utilize new data, versus having to re-fit models each time new data become available⁴⁵. A more in-depth analysis of different variable combination and model development algorithms might be possible via implementation of bias correction bootstrapping cross validation routines, which are considerably less computationally intense than our NSpCV routine⁴⁶. However, given the significant changes in error imposed by simply applying different stopping rules (Data S3), and the fact that most of the classification algorithms were producing greater than 90% prediction accuracy, it seems likely that refining the post-processing upper limit method by applying a secondary classification model, local maxima search algorithms, or additional conditional logic routines, may yield the greatest net reduction in error with a relatively low development input.

In conclusion, we offer a prediction model development and evaluation framework for how to systematically consider the upper distribution limit of fish that could be broadly applied to watersheds and fishes. Distribution boundaries are fundamental for species conservation as well as for understanding how species might respond to environmental change; policymakers and managers reference distribution maps to determine management decisions, policies, and regulations. Distribution of fish influences policies can impose costs in the form of forest harvest restrictions and benefits in the form of ecosystem services, including co-benefits to other species found with fish, including crayfishes, stream-living amphibians, and mussels. UPRLIMET offers modeled trout distributions across the landscape and land ownerships through a shared map, stream flowlines, and data sources, all of which can be updated as new data is gathered. With this comprehensive prediction model development and evaluation framework, we (a) improve the information available to policymakers and managers by incorporating the best available LiDAR-derived hydro-topographic data, (b) train the model using field observations, (c) compare our findings with other methods that managers are using for estimating fish distributions (e.g. Fransen et al. (2006)¹³ and 20% slope threshold), and (d) contrast UPRLIMET prediction results with multiple fish distribution datasets being used to identify the upper extent of fish. The availability and use of common models, data, and maps across land ownerships will streamline policy and management planning and activities.

UPRLIMET: UPstream Regional LiDAR Model for Extent of Trout in stream networks

Future tree survival in European forests depends on understorey tree diversity

Oscillating flower colour changes of Causonis japonica (Thunb.) Raf. (Vitaceae) linked to sexual phase changes

ITALIAN LANGUAGE

ENGLISH LANGUAGE