Cryogenic land surface processes shape vegetation biomass patterns in northern European tundra
Study areaThe study area (78 000 km2) is located between 68–71°N and 20–26°E, with strong climatic gradients, ranging from wet maritime to relatively dry continental, over tens of kilometers. The landscape of this climatically sensitive high-latitude region has been affected by multiple glaciations in the past. It includes the Scandes Mountains near the Arctic Ocean and low-relief areas to the south and east. The majority of the region (52%) is underlain by sporadic permafrost. Continuous and discontinuous permafrost are limited to the highest mountains of the study area (2% and 7%, respectively)17,26. This large proportion of sporadic, typically warm and shallow permafrost in the study area indicates that ground thermal response to climate warming can be rapid27. Our data do not cover low-relief plateaus of continuous permafrost (similar to northern Siberia and Alaska), where the generally high ice content of soil may lead to different and enhanced LSP responses under climate warming (e.g., ice wedge degradation and surface ponding) with altered AGB feedbacks43,44.LSP observationsThe data consist of 2917 study sites (each 25 m × 25 m) and includes previously combined observations (both in-situ [n = 581] and remote-sensing [n = 2336]) of the active surface features of three cryogenic LSP common in the area: cryoturbation, solifluction, and nivation. These LSP are mainly associated with seasonal freeze–thaw processes. Cryoturbation (i.e., frost churning) is a general term for soil movement caused by differential heave, and it creates typical surface features such as patterned ground, frost boils and hummocky terrain5. Solifluction is the slow mass wasting of surficial deposits through frost creep and permafrost flow, where gravitation causes frost-heaved soil to settle downwards during the summer thaw, creating features of lobes and terraces50. In addition, solifluction also includes gelifluction which is a mass wasting process caused by high porewater pressure in unconsolidated surface debris creating similar lobes and terraces5,50. We use the term nivation to collectively designate various weathering and fluvial processes which are intensified and depicted by the presence of snowbeds (which in general are melting in mid-July – late-August) and nivation hollows28,51. We expect the presence of such a snowbed to be an indication of active nivation processes, since in these environments the year-to-year spatial snow patters are fairly consistent31.The rationale behind LSP sampling is described in previous geomorphic studies which served as a basis for the used protocol52,53. Due to the large study domain, study objectives (focus on distribution of active surface features, not on activity itself) and modeling resolution (50 m × 50 m), we used a visual method to estimate the presence/absence of the mapped LSP. We used high-resolution aerial photography (spatial resolution of 0.25 m−2) and targeted field surveys (GPS accuracy ~5 m; Garmin eTrex personal navigator) to construct the LSP dataset. A binary variable (1 = presence, 0 = absence) was assigned to each LSP to indicate their evident activity (or absence). The activity/absence of the LSP was visually estimated based on the evidence in ground surface, indicated by e.g., frost-heaving, cracking, microtopography (e.g., erosional and depositional forms), soil displacement indicative to a process form (e.g., solifluction lobes, patterned ground), changes in vegetation cover and late-lying snow. Such indicators average the LSP activity over several years. Even small areas with slight indication of activity were considered active processes. However, such a protocol based on a visual assessment is susceptible for incorrect activity classification; solifluction may be active despite having a complete vegetation cover19 and the presence of late-lying snowbed, although being a good indication28, does not necessarily mean that active nivation processes are present.Remotely sensed vegetation indexFor obtaining remotely sensed vegetation index for the study area, we employed a maximum-value compositing approach. We downloaded all available clear sky (less than 80% land cloud cover) Landsat OLI 8 images overlapping the study area from June to September between 2013 and 2017 (total of 1086 scenes) from the United States Geological Survey (USGS) database (http:\earthexplorer.usgs.gov). Images were USGS surface reflectance products, which were preprocessed (georeferencing, projection, and atmospheric corrections) by USGS54. Landsat-8 satellite is the latest addition to the Landsat mission that has provided repeated land surface information globally since the 1970’s and is the most commonly used fine-scale satellite system for vegetation mapping. The native resolution of the Landsat OLI sensor is 30 m for the spectral bands used in the image processing steps of this study.Normalized difference vegetation index (NDVI), a widely used spectral index to estimate the amount of green vegetation, was calculated as55:$$({{{{{rm{rho }}}}}}{{{{{rm{NIR}}}}}}-{{{{{rm{rho }}}}}}{{{{{rm{red}}}}}})/({{{{{rm{rho }}}}}}{{{{{rm{NIR}}}}}}+{{{{{rm{rho }}}}}}{{{{{rm{red}}}}}})$$
(1)
where ρNIR and ρred are the surface reflectance for their respective Landsat bands, 0.851–0.879 (mu)m and 0.636–0.673 (mu)m.USGS provides pixel-based quality assessment bands for all surface reflectance products. These bands were used to mask clouds, snow, water, and other low-quality pixels from the individual NDVI scenes. Additionally, if the NDVI images still had unphysical values over 1 or under -1, these pixels and their surroundings of 100 m radius were excluded. We determined maximum values for each 30 m resolution pixel of the study area individually. After masking cloud, snow, and water from the scenes, obvious scattered erroneous NDVI values remained in some scenes. Therefore, we excluded the values outside the pixel-based 95% percentile prior to maximum composite.The CFmask cloud detection algorithm that is used to generate the quality assessment band has clear difficulties in distinguishing small snow patches from clouds. As such, a large portion of late-lying snow beds were repeatedly and incorrectly classified as clouds. Moreover, the CFmask algorithm creates buffers around the cloud pixels54, hence much information was lost around the snow patches that were incorrectly identified as clouds. After these processing steps, some pixels around the extreme late-lying snow beds had still too low number of NDVI records to provide reliable NDVI values for the maximum composite. To fill these small and scattered gaps in the initial maximum NDVI composite, we selected 74 mostly cloud-free scenes between August and September. For these 74 scenes, we manually digitized cloud masks to exclude cloud-contaminated pixels with high certainty. Moreover, every pixel must have passed the following quality checks to be included in the gap-filling composite: not classified as water in the USGS quality assessment band; normalized difference snow index (NDSI) value less than 0.4, and blue band reflectance less than 0.1 (to exclude snow); reflectance of red band between 0.03 and 0.4 (second check for water and snow, and deepest shadows); NDVI between 0 and 0.4 (lower threshold to exclude snow and water contamination; higher threshold to exclude erroneous values, as very late snowbed habitats always have very limited vegetation cover). Additionally, if the NDVI images had unphysical values over 1 or under -1, these pixels and their surroundings (200 m radius) were excluded. Pixels in the 74 selected images which passed these checks, were then used to create a secondary maximum NDVI composite that was used to fill the gaps in the initial maximum NDVI composite. The secondary composite comprised 0.4% of the pixels in the final composite. Among all 2917 LSP observation sites, 2.9% were located within the gaps in the initial maximum NDVI composite, and thus received their maximum NDVI values from the secondary NDVI composite.In the used Landsat data, the nivation sites were not covered by snow, but instead were associated with generally lower AGB values as nival processes affect the vegetation’s structure and composition (Supplementary Table 1).Above-ground biomass dataAbove-ground biomass (AGB) reference data were collected from two regions, with a total of 433 sites that represent an area of > 4000 km2 (Supplementary Fig. 9). The first dataset (hereafter BM region 1; centering to ca. 69°N, 21°E) was collected between 2008 and 2011, and the second dataset (BM region 2; centering to ca. 70°N, 26.2°E) between 2015 and 2017. Both study regions are representative of an arctic and alpine treeline ecotone and include data from mountain birch forest to barren oroarctic tundra56,57.The BM region 1 dataset consists of 309 field sites (each 10 m × 10 m), which are located around eight different massifs covering a wide range of environmental conditions (Supplementary Figs. 9–10). Sampling was performed in transects to cover various aspects of the slope (i.e., topoclimatic conditions), starting from the foothill of the mountain, and ending at the summit. A plot was systematically established at every 20 m increase in elevation and recorded with a GPS device. Four clip-harvest biomass samples (20 cm × 20 cm) were taken 5 m from the plot center in every cardinal direction. Two samples were used in bare mountaintops (north, south). The clip-harvest samples were dried for 48 h at +65 °C, and dry weight was recorded. The sample biomass values were converted to g m-2 and the average sample value was calculated for each site (Supplementary Fig. 9). The original BM region 1 dataset contains forest and treeline plots, but these were excluded from the final analyses due to an incomparable tree sampling strategy with BM region 2, which could introduce uncertainty into biomass estimates.The BM region 2 data were collected from three different massifs having an elevation range from 120 m to 1064 m (Supplementary Fig. 9). The biomasses were sampled from 102 sites (each 24 m × 24 m in size) that were chosen using a stratified sampling to cover gradients of thermal radiation (potential incoming solar radiation), soil moisture (topographic wetness index, TWI) and vegetation zone (forest, treeline, and alpine zones). Radiation and TWI were calculated from a 10 m digital elevation model (DEM, provided by the National Land Surveys of Finland and Kartverket, the Norwegian mapping authority), and assigned to one of three classes based on observation percentiles (breaks at 20% and 80%) leading to total of 27 strata. Vegetation zones were digitized based on aerial imagery. After the first field survey, 22 sites were added to account for vegetation types that were not sufficiently represented by the GIS-based stratification. Thus, the total sample size of the BM region 2 dataset is 124 AGB sites.The same clip-harvest sample protocol was used as in BM region 1; additional samples were also taken from 12 m in every cardinal direction, thus each site had eight AGB samples (Supplementary Fig. 9). Trees with diameter at breast height (DBH) greater than 20 mm were measured from a 900 m2 circular plot, which corresponds to the size of the NDVI product resolution. Large stems (DBH > 80 mm in the forest and 40 mm at the treeline) were measured from the whole plot, whereas smaller stems were measured from five subplots. Specifically, the center subplot was 100 m2, and the four subplots located at 8 m to every cardinal direction were each 12.5 m2. For the subplot observations, we used a plot expansion factor (900/150 = 6) to generalize the observations for the whole plot assuming a homogeneous forest structure i.e., each subplot stem represents six trees within the 900 m2 plot. A total of 98% of the measured stems were mountain birch (Betula pubescens ssp. czerepanovii), making it the most abundant species in the area. For predicting stem biomass, we used the average of three allometric equations58,59,60, in order to reduce the uncertainty related to the transferability of an individual allometric model. In addition, Populus tremula (1% of the observations) were found on low-altitude south-facing slopes, and Salix caprea (1%) in moist, nutrient-rich sites. Species-specific models61,62 were used to estimate their respective stem biomasses. Individual pines (Pinus sylvestris) were scattered in the area but were not present in any of the sampled plots.The plots of above-ground tree biomass were converted to g m−2 and added to the mean clip-harvest AGB to obtain the total vascular plant AGB for each site. The BM region 1 and BM region 2 datasets were combined, and the NDVI value was extracted from the site center coordinates.Spatial autocorrelation (SAC) is a common property of any spatial dataset and means that observations are related to one another by the geographical distance63. SAC in the model residuals violates the independence assumption commonly required by statistical models and can lead to inflated hypothesis testing and biased model estimates64. To investigate whether the plot-scale AGB data are spatially autocorrelated, we calculated semivariogram which describes the spatial dependency between the observations as a function of distance between the point pairs65. Semivariogram were calculated as:$${{{{{rm{gamma }}}}}}(h)=frac{1}{2N(h)}mathop{sum }limits_{i=1}^{{N}_{h}}{left(Zleft({s}_{i}right)-Zleft({s}_{i}+hright)right)}^{2}$$
(2)
where N(h) denotes the number of data pairs within distance h, and (Zleft({s}_{i}right)) is an observation (or model residual) in location i. For the calculation, we used R package gstat66 (version 2.0-0). A visual inspection of the semivariogram indicated spatial autocorrelation at short distances (“AGB” in Supplementary Fig. 11). Therefore, for the NDVI-AGB conversion, we used a generalized least squares modeling (GLS, as implemented in R package nlme67 [version 3.1-137]) that can explicitly account for SAC in the data. For the modeling, the AGB values were log(x+0.1) transformed. The GLS, where AGB was modeled as a function of NDVI, were fitted assuming an exponential spatial correlation structure:$$gamma left(hright)={c}_{0}+cleft(1-{e}^{-h/a}right)$$
(3)
where ({c}_{0}) is the difference between the intercept and origin (i.e., the “nugget” parameter in geostatistics), (c) is the amount of variance (i.e., the “sill”) and a represents the distance of spatial dependency (i.e., the “range”). The fitted GLS was as follows:$${{log }}left({AGB}right)=-1.038629+9.725572times {NDVI}$$
(4)
The estimated spatial correlation parameters were c0 = 0.516, c = 0.484 and a = 260.605, indicating that the distance of spatial autocorrelation extends to ca. 261 m. The semivariogram for the model residuals indicated a notable reduction in the amount of spatial autocorrelation compared to the AGB data (Supplementary Fig. 11). The fitted model explained 70.6% of the deviance in the data. When the predicted values were converted back to the response scale, the model explained 60.5% of the deviance. Therefore, for the subsequent analyses we use the above-ground biomass estimated by the model.Environmental predictorsIn addition to LSP, we used climate, topography, and soil predictors to model AGB. Gridded monthly average temperatures and precipitation data (1981–2010; spatial resolution 50 m × 50 m) based on a large collection ( > 950) of Fennoscandian meteorological stations were used in a spatial interpolation scheme17. Three climate predictors—growing degree days (GDD, °C, base temperature 5 °C), mean February air temperature (Tfeb, °C) and water balance (WAB, mm)—were calculated from the gridded climate data. WAB is the difference between total annual precipitation and potential evapotranspiration (PET), which was estimated from the monthly air temperature and precipitation data68:$${PET}=58.93times {T}_{{above}0^circ C}/12$$
(5)
These climatic predictors were selected to represent different aspects of climate that are critical for tundra vegetation: heat requirements, cold tolerance and moisture availability. In addition, two local scale topographic predictors were calculated from a DEM (spatial resolution of 50 m × 50 m, provided by the National Land Survey Institutes of Finland, Norway, and Sweden): topographic wetness index69 (TWI, a proxy for soil moisture) and potential annual direct solar radiation70 (MJ cm-2 a-1). Slope angle was initially considered as a potential predictor for AGB but was later omitted due to the strong correlation with TWI (-0.93, P ≤ 0.001). We also calculated peat cover (%) from a digital land cover classification71. Here, the native resolution of 100 m was resampled at 50 m to match the resolution of the climatic and topographic predictors, using nearest-neighbor interpolation. The binary peat cover variable was transformed to a continuous scale using a spatial mean filter of 3 × 3 pixels52. Finally, the topmost soil layer of a global gridded soil database72 was used to obtain pH data. Again, the original resolution of 250 m was also resampled to 50 m resolution using bilinear interpolation.Our fine-scale data revealed strong environmental gradients over the 78,000 km2 study area (Supplementary Table 1), most of which were only moderately inter-correlated (Spearman’s correlation coefficient More