### Real-world data

The dataset consisted of 2986 observations from 902 freshwater shallow lakes in Denmark and North America (data extracted from the LAGOSNE database on 22 February 2022 via R LAGOSNE package version 2.0.2)^{56} (Supplementary Fig. 9). The Danish lakes were sampled for one or several years from 1984 to 2020 (data extracted in October 2021 from https://odaforalle.au.dk/main.aspx) (Supplementary Fig. 10). Prerequisites for inclusion in the analysis were that lakes had been sampled for physical and chemical variables at least four times or at least three times over the growing season (May to September) for the Danish or North American lakes, respectively, had a mean depth of less than 3 m and were freshwater. Water chemistry samples were analysed using standard methods and data for total phosphorus (TP), total nitrogen (TN) and chlorophyll-a are included here^{57}. The mean and range of TP, TN and chlorophyll-a for the combined sites is given in Table 1, along with the values for each region separately.

To gain a longer-term perspective on the relationship between nutrients and chlorophyll-a, we calculated the across-year averages of the summer means of TP, TN and chlorophyll-a, sequentially increasing numbers of years included in the mean up to a total of a five-year mean, at which point there were only 99 lakes left in the dataset. In calculating the multi-year means we allowed a maximum gap of 2 years between observations (i.e. two observations could cover 3 years) to avoid including time series with too many missing years in between. Hence, only lakes with sufficient numbers of sequential data were included, resulting in a large drop in lake numbers as the length of the multi-year mean increased (Table 2).

### Numerical methods

#### Diagnostic tests or proxies of alternative equilibria

We modelled the response of chlorophyll-a to TP and TN using generalised-linear models^{58} with Gamma distribution and an identity link on untransformed data for single-year and multiple-year means up to 5-year means. We used the Gamma distribution, as chlorophyll fit this significantly better than a normal or log-normal distribution. We used psuedo *R*^{2} of the model along with the patterns of residuals, and finally, we plotted the kernel density of the chlorophyll-a values as diagnostics of the presence, absence or prevalence of alternative equilibria in the simulated and real work data.

To test how appropriate these diagnostics or proxies of alternative stable states in terms of how well they identify the existence of alternative stable states in randomly sampled multi-year data, we

- 1.
Simulated two scenarios for the main manuscript, with and without alternative stable states in the data, which were as close to the real-life data as possible. The results of these scenarios appear in the main text (please see details below in the “Data Simulation” section).

- 2.
We provide multiple scenarios with different degrees, or prevalence, of alternative stable states in the data, see simulations of alternative stable state scenarios. The results of these scenarios appear in Supplementary note 2.

### Hierarchical bootstrap approach

There are a large number of permutations of data, both real-world and simulated, that can provide a mean of the two to five sequential years from each lake in the time series data. It was vital to have a method that selects the data for analysis that provides a valid and comparable representation of both real work and simulated data and the models’ errors. In order to provide this we used a non-parametric hierarchical bootstrap procedure^{38}. The flowchart shows the data preparation and data analysis steps of the hierarchical bootstrap procedure (Fig. 4). In the first step (step 1 in Fig. 4), all possible longer-term means are calculated for each lake. To keep as much data as possible, we decided to allow for up to 2 years of gap in the data between years. Taking the 5-year mean data as an example, if data from a lake existed for the years 1991 and 1994−1997, a 5-year mean would be calculated for the years 1991, 1994, 1995, 1996 and 1997. However, if the time series would contain a larger gap, e.g. data would only exist for the years 1991 and 1995–1998, no 5-year mean could be calculated. After the selection procedure, all the 2-year, 3-year and 5-year means are transferred into a new table (step 2 in Fig. 4).

The procedure is the same for each temporal scale from 2-year means to 5-year means. For the example of 5 mean years, lakes are randomly sampled from the full 5-year mean dataset in step 2 (Fig. 4) with replacement up to the number of lakes as in the original dataset, for the 5-year means 99 (step 3a). Here, the same lake can appear multiple times or not at all. This step is common for every bootstrap procedure^{59}. However, since we have nested data (5-year means within lakes), we need a second step, in which for every resampled lake in step 3a, one 5-year mean is chosen (step 3b in Fig. 4). Then the three GLM models are produced from the randomly selected data in step 3c (Fig. 4). These steps are then repeated 1000 times to get a good representation of the uncertainties of the model. To ensure a fair comparison between single-year data and their equivalent multi-year mean data, we repeated the bootstrap procedure with single years only using only the lakes for which we also calculated multi-year means. To take the five-year mean as an example, there were 99 lakes where we could calculate at least one 5-year mean observation. First, we ran the bootstrap procedure to calculate 5-year mean values of TP, TN and chlorophyll-a (1000 times) and then took single years’ values of TP, TN and chlorophyll-a (1000 times) from exactly the same 99 lakes. With this approach, exactly the same datasets with the same lakes and observations within lakes are used for the calculation of the multi-year means and their single-year counterparts, making for a robust analysis. The GLM models did not always converge. If either the TP, TN or TP*TN model with interaction did not converge, the iteration was not used in further analysis. The number of converging models equal for each iteration of random samples is given in the results.

The described hierarchical approach is the best way to reflect the structure of the original data. A simple, non-hierarchical bootstrap would favour lakes with more five-year means over lakes with fewer five-year means, simply because these make up a larger part of the data. Furthermore, sampling without replacement at the lake level would result in five-year means from lakes with few data dominating the produced random dataset, as every lake would be sampled every time, which then would result in high model leverage of 5-year means from lakes with less data. In contrast, the hierarchical procedure ensures that every lake has the same chance to end up in the randomly sampled bootstrap, in the second step, it ensures that of each sampled lake, every 5-year mean has the same chance to end up in the random dataset. These notions are in agreement with the findings of an assessment on how to properly resample hierarchical data by non-parametric bootstrap^{38}.

### Data simulation

#### General approach of simulation assumptions and procedures

We generated random scatter for the generalised-linear model based on Gamma distributions for two different “populations” of lakes with two different intercepts and slopes. At first, we calculated the linear equations for the two populations:

For each population *i* and *j*, 99 samples (equalling the number of lakes in real-life data with 5-year means, *n*_{lake}) were generated with a specific number of data points depending on the scenario (*n*_{year}) each, hence *n*_{lake} = 1−99 for each population of lakes, e.g. with 20 years (*n*_{year} = 20) each.

We found the real nutrient data to be normally distributed, with total nitrogen (TN) having a range between 0.33 and 4.93 mg/L and a constant coefficient of variation (CV, with a mean CV of 0.35) across this range (the same is true for total phosphorus (TP) at a shorter range). Hence, for each *n*_{lake}, the *x* for the *n*_{year} = 20 were generated based on the mean range (mean per lake of the real-life data) and CV (0.35) from the real-life TN concentration data, hence with a range of 0.33 to 4.33 mg/L. Therefore the values and random variability of *x* in the simulations are close to the true values of the TN concentrations. The *x* is then fed into the linear equations above.

To the resulting *y*_{i} and *y*_{j} we added random noise based on the Gamma distribution (using the rgamma function in R). We used a Gamma distribution because the Chlorophyll-a concentration also follows a Gamma distribution. The variability of a Gamma distribution is expressed by the shape variable. The variability of chlorophyll-a, its shape value, equals 2.63. This shape value was used in the Gamma distribution of *y*_{i} and *y*_{j}. The final calculated *y*_{i} and *y*_{j} had therefore a random rate calculated as shape/*y*_{i} or shape/*y*_{j}. Hence, their variability in the y dimension was close to the true chlorophyll-a variability.

The data from both lake populations were then pooled and randomly sampled using the same hierarchical bootstrap procedure with 500 iterations for the scenarios in the supplementary materials and with 1000 iterations for main text simulation scenarios, which is identical to what was done for the real-world data.

#### Simulation scenarios based on characteristics of real-world data

The real-world 5-year mean data consisted of 99 lakes with 5–20 years of data for each lake. For the simulation scenario in the main text, we therefore randomly sampled between 5 and 20 data points for each of the 99 simulated lakes based on the *x* distribution described above. Intercepts and slopes of the simulation, resembled the range of the true data (see scatter plots in Fig. 2 of the main manuscript).

In the alternative stable state scenario, we chose two slopes and intercepts for different populations of lakes:

Population i:

*ai*= 0,*bi*= 40Population j:

*aj*= 50,*bj*= 120

We based the slopes and intercepts of the ASS scenarios on the diagnostic combination defined by Scheffer and Carpenter^{7} which propose an abrupt shift in (a) the time series, (b) the multimodal distribution of states and (c) the dual relationship to a controlling factor. Here, the idea is that an ecosystem will jump from one state to the next at the same (nutrient) conditions (different intercept and/or slope, condition a within ref. ^{7}), where any change in the nutrient will have different effects on algae or macrophytes (best represented by different slope, condition c), resulting in a multimodal distribution of the response (condition b). Hence, simulations are in line with what is predicted for ASS, but we took great lengths to also show other possibilities with the simulations in the Supplementary information to ensure we did not overlook any occasional occurrence of alternative equilibria.

Here, the appearance of alternative stable states in the data could happen at any point in the time series of a single lake, or the entire time series could include only one of the two alternative stable states. To accommodate these alternative stable state constellations (for each of which we made a separate simulation scenario, (see Supplementary Note 2, “Simulations of alternative stable state constellations”), we forced the alternative stable state scenario to be constructed of 1/3 of data with one state, 1/3 of data with the second state and 1/3 of data where both alternative states could occur. In the latter case, the alternative stable state appeared after the first 20% but before the last 20% of the time series. Since the variability and range of *x* (nutrient) and *y* (chlorophyll -a response) is simulated as close as possible to the real-world data in all scenarios, the measures taken here (variable time series and combination of different alternative stable state scenario constellations) produce a simulation as close to the real-world data as possible. Specifically, we found the real-world nutrient data to be normally distributed, with total nitrogen (TN) having a range between 0.33 and 4.93 mg/L and a constant coefficient of variation (CV, with a mean CV of 0.35) across this range (the same is true for total phosphorus (TP) at a shorter range). Hence, for each simulated lake, the x were generated based on this mean range and CV. Furthermore, the resulting *yi* and *yj* were randomised by using a Gamma distribution (using the rgamma function in R). We used a Gamma distribution because the chlorophyll-a concentration also follows a Gamma distribution. The variability of a Gamma distribution is expressed by the *shape* variable. The variability of chlorophyll-a, its *shape* value, equals 2.63. This *shape* value was used in the Gamma distribution of *yi* and *y*. The final calculated *yi* and *yj* had, therefore a random *rate* calculated as *shape/yi* or *shape/y*. Hence, their variability in the y dimension was close to the true chlorophyll-a variability.

For the scenario without alternative stable states, both populations of data had the same intercept and slope:

Population i:

*ai*= 0,*bi*= 40Population j:

*aj*= 0,*bj*= 40.

Please see Supplementary Note 2 for further simulations of different potential constellations of alternative states. There we show that our approach finds alternative stable states in response to nutrient concentration, even if they appear in time series from different lakes.

#### Assessment of diagnostic tests or proxies of alternative equilibria

We modelled the response of chlorophyll-a to TP and TN using generalised-linear models^{3} with Gamma distribution and an identity link on untransformed data for single-year and multiple-year means up to 5-year means. We used the Gamma distribution, as chlorophyll fit this significantly better than a normal or log-normal distribution. We used *R*^{2} of the model along with the patterns of residuals, and finally, we plotted the kernel density of the chlorophyll-a values as diagnostics of the presence, absence or prevalence of alternative equilibria in the simulated and real work data.

The comparison of how the diagnostics/proxies of alternative stable states respond to the variation in the prevalence of alternative equilibria in the simulated datasets provides a robust assessment of their ability to identify both the presence and absence of alternative equilibria. It is the response of these diagnostic tests over time, with the increase in the temporal perspective as more years are added to the mean values of TP, TN and chlorophyll-a, that are key to the identification of the presence and or absence of alternative equilibria in a given dataset. The simulations show that a dataset which contains alternative equilibria will show (1) no improvement in *R*^{2} as the temporal perspective of the data increases (more years in the multi-year mean); (2) an increased bimodality in the residuals of the models of nutrients predicting chlorophyll-a will increase as more years are added to the multi-year mean and (3) the kernel density function of chlorophyll-a will display increasingly bimodality as more years are added to the mean. In the absence of alternative equilibria, the patterns differ with an *R*^{2}, and increase in unimodality of residuals and a consistent unimodal pattern in the kernel density function. Thus, the diagnostic tests provide a robust test of both the presence and absence of alternative equilibria in a given dataset.

### Alternative stable state assessment for real data with limited data range

It could be the case that alternative stable states do not appear in the full dataset but only in a limited TN and TP concentration range. We filtered and re-analyzed the data, only keeping data points within the following two ranges: – TN concentration = 0.5−2 mg/L–TP concentration = 0.05−0.4 mg/L. In the filtered data, 1329 out of the original 2876 single-year data points, 289 out of 1028 3-year mean data points and 212 out of the 864 five-mean year data points remained. The filtered data consisted of data points from 550, 48 and 27 lakes for the single-year data, 3-year means and 5-year means, respectively. The smaller range resulted in lower *R*² of the models, yet the pattern that multi-year means result in higher *R*² compared to single-year data was largely consistent, apart from the 5-year mean TN models for which both, the single-year and mean data resulted in very low *R*² (Supplementary Fig. 6). Furthermore, due to the lower number of samples, the errors of all proxies are higher, making conclusions more difficult than for the full data. Still, we do not see any clear indication of alternative stable states in the scatter plots (two groups of dots are not appearing (Supplementary Fig. 5), the Kernel density plots (or model residuals (Supplementary Fig. 6)). i.e. no signs of bimodality in residuals or Kernel density plots. Please see details on this analysis in the supplementary material.

Details and the R code for the steps for the random multi-year sampling can be found in the supplementary materials.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source: Ecology - nature.com