Chloroplast and repetitive nuclear DNA enrichment in the sedaDNA extracts
To the best of our knowledge, we generated the first large-scale target enriched dataset using sedaDNA extracted from sediments of multiple lakes. Sequencing of two datasets produced 325.5 million (M) quality-filtered paired-end DNA sequences. The first target enriched dataset, targeting both the chloroplast and a set of nuclear genes of Larix on 64 sedaDNA extracts and 19 negative controls from seven lake sediment records resulted in 324 M quality-filtered paired-end sequences. The second target enriched dataset, targeting only the set of nuclear genes of Larix on four samples and two negative controls from an additional lake (Lake CH12) resulted in 1.5 M sequences. Quality-filtering of an additional published target enriched dataset29, targeting the Larix chloroplast genome on the same CH12 samples as applied for the second dataset, added another 54 M sequences.
For the chloroplast enrichment, 390 thousand (K) sequences (1%) were classified as Larix at the genus or species level. The average coverage of bait regions was 19% at a mean sequence depth of 0.8. Sequencing of 19 library and extraction blank (negative control) samples resulted in 597 K paired-end sequences, of which 58% quality-filtered and deduplicated sequences remained. Of these, 38% were classified, with 0.03% of them (463 sequences) corresponding to the genus Larix. Negative controls from library preparation resulted in no to very few (0 to 5) sequences mapping to the Larix chloroplast reference genome. Negative controls from DNA extractions, which were in several cases pooled to one library, showed a low number of sequences mapped to Larix (0 to 94 sequences, except 237 sequences in one case). Excluding all sequences in negative controls from the sample analysis had no impact on the patterns resulting from the analysis of sample data. Detailed results and evaluation of negative controls are included in the Supplementary Information (Fig. S5) and Supplementary Data 1 and 2. Samples of all lake records with sufficient sequence coverage showed damage patterns typical of ancient DNA (see Supplementary Data 3).
These results are comparable to the results obtained by Schulte et al.29, where 36% of quality-filtered sequences were classified as Viridiplantae with 9% assigned to Larix. In contrast to29, we raised the confidence threshold of taxonomic classification (a parameter defining the number of k-mers needed to produce a match against a taxon in the database), which drastically reduced the number of classified sequences, but increased the confidence in the analysis36.
To analyze the enrichment obtained by the nuclear gene bait set, taxonomic classification was repeated using a plant genome database including available Pinaceae genomes. The classification resulted in 716 K sequences assigned to Larix, increasing the previous results by 325 K sequences. However, almost no sequences were mapped against the targeting baits (a maximum of five sequences for some samples). A closer inspection of unmapped sequences assigned to Larix revealed a high content of repetitive DNAs. More specifically, taxonomically classified Larix sequences could be assembled to EulaSat1, the most abundant satellite repeat in the nuclear genome of Larix32,37. This short repeat with a 173 bp long motif is arranged in large arrays of tandemly repeated motifs and is exclusively present in larches32. Analysis of modern L. sibirica, and L. gmelinii (western and eastern range) genomes reveals that EulaSat1 occurs in all species, contributing to 0.62% (L. sibirica), 2.52% (western range L. gmelinii), and 2.39% (eastern range L. gmelinii), of the genomes, respectively (Fig. S2). A comparison of the sequence proportions mapping to the repeat motif in the different datasets of Lake CH12 showed a specific enrichment of the repeat motif by the nuclear gene hybridization probe set (Fig. S3).
In total, 17 K sequences mapped to the repeat motif of EulaSat1. The abundance of all sequences mapped per sample is in agreement with the abundance of sequences mapped to the chloroplast genome, confirming the general history of forest development (Fig. 2). Analysis of the nucleotide frequencies in the repeat motif showed a high constancy over all samples (Fig. S4). This suggests high conservation of the EulaSat1 motif in Siberian larches over time and space. Although satellite repeats are reported to have a high sequence turnover, for larches it has been shown that repeat profiles between two geographically well-separated species—the European larch (L. decidua) and the Japanese larch (L. kaempferi)—are very similar32. The main satellite in all larches, EulaSat1, is believed to have greatly multiplied after the split of Larix from Pseudotsuga32. Given the ongoing hybridization between the three Siberian larch species, it is not surprising to find a consistent pattern of nucleotide frequencies in all samples.
Off-target sequences in target enriched datasets have already been demonstrated to be useful for the analysis of high-copy DNA such as ribosomal DNA or plastomes34,38,39. A recent study on five modern sedges showed that target enriched sequencing data originally targeting a set of gene exons can also be used to study the repetitive sequence fraction and even infer phylogenetic relationships based on repetitive sequence abundance35. Another study showed that also sequence similarities between homologous repeat motifs can be used to reconstruct phylogenetic relationships among closely related taxa40,41. In the case of Larix satellite EuLaSat1 in our study, no change in nucleotide frequencies, neither related to locations nor in time, could be detected. However, our results show that the off-target fraction in target enriched sedaDNA datasets can hold valuable information and that repeat motifs in more diverse taxon groups could even be a target for enrichment. Specifically enriching for repeat motifs in sedaDNA extracts could enable the study of satellite repeat evolution as well as giving additional information on species abundance and phylogeography.
In the two target enriched datasets, sequences taxonomically classified to the genus Larix and mapping to the chloroplast and to the repeat sequence, respectively, show similar patterns of abundance (see Fig. 2). Compared with published metabarcoding and pollen data from the same locations, the Larix abundance patterns can be globally reproduced, underpinning the notion that sequence abundances in target enriched data can be used as good estimates of plant abundances. For older parts of the lake records, target enriched data show Larix where metabarcoding data were unable to detect a clear signal (see Fig. 2, lakes Billyakh, Bolshoye Shchuchye, Kyutyunda, and Lama). This shows that target enrichment is superior to metabarcoding when analyzing one taxonomic group in-depth, as it is less prone to errors by DNA degradation, which can impede primer binding if the molecule becomes too short. Also, independent of age, rare taxa mostly need multiple PCR replicates to be detected by metabarcoding42,43. Target enrichment, however, is more sensitive in identifying one focal taxon group, as the total target length can be much larger (e.g., a complete organellar genome) than for metabarcoding, and the DNA damage patterns are put to use to authenticate ancient DNA. Also, it is limited by molecule length only by the applied threshold in the bioinformatic analysis, for which we used 30 base pairs (bp) as opposed to a minimum of 85 bp molecule length for the Larix metabarcoding marker (for the plant-specific trnL g/h marker44). Similarly, compared to traditional pollen analysis, target enrichment is more accurate at tracing a specific target group, as it is not dependent on pollen productivity. Especially in the case of Larix, pollen productivity is low and preservation poor, resulting in rare findings of its pollen in the sediments22,45. This could explain why for Lake Bolshoye Shchuchye, only a single Larix pollen grain was retrieved throughout the core, whereas target enrichment and metabarcoding show a strong signal in the Holocene sediments (last ~12 ka BP). Target-enriched data also records signals in MIS 2 sediments, however, sequence counts are extremely low, and as it is the only record, where both of the other proxies fail to report a signal, it should be interpreted with caution.
A wider pre-glacial distribution of L. sibirica
Chloroplast genomes of L. gmelinii and L. sibirica differ at 157 positions, which can be used to differentiate species in target enriched sedaDNA29. Here, we applied this approach to lake sediment records, which are distributed across Siberia (Fig. 1) and have time ranges back to MIS3, and thereby were able to track species composition in space and time for wide parts of the species ranges.
In lakes Billyakh and Kyutyunda, ca. 1500 km east of L. sibirica current range (Fig. 1), we found evidence for a wider distribution of L. sibirica around 32 and 34 ka BP in MIS3 (Fig. 3). Billyakh is situated in the western part of the Verkhoyansk Mountains, and Kyutyunda on the Central Siberian Plateau. Both lakes have low counts of Larix DNA sequences in their oldest samples dated to 51 ka BP (Billyakh) and 38 ka BP (Kyutyunda) with variants of L. gmelinii, but there is a sudden rise in variants attributed to L. sibirica at 34 ka BP (Billyakh) and 32 ka BP (Kyutyunda), which persists in the following samples, but strongly decreases in younger samples (Fig. 3). The rise in the L. sibirica DNA sequence variants coincides with a peak in sequence counts for Lake Kyutyunda. These signals suggest a rapid invasion of L. sibirica into the ranges of L. gmelinii in climatically favorable times and a local depletion or extinction of L. sibirica during the following harsher climates. Lake Billyakh pollen data suggest a moister and warmer climate around 50–30 ka BP than in the latter part of the Last Glacial associated with the MIS3 Interstadial in Siberia46.
Strong support for a wider pre-glacial distribution of L. sibirica comes from genetic analyses which show that it is genetically close to L. olgensis, today occurring on the Korean Peninsula and adjacent areas of China and Russia27,47. It is assumed that the L. sibirica-L. olgensis complex used to share a common range, which was disrupted and displaced when the better cold-adapted L. gmelinii expanded south and southwest during the more continental climatic conditions of the Pleistocene47,48. Furthermore, modern and ancient genetic studies suggest that the L. sibirica zone was recently invaded by L. gmelinii from the east in the hybridization zone of the species, as the climate cooled after the mid-Holocene thermal maximum13,23. Today, pure stands of L. sibirica do not form a continuous habitat, but occur in netted islands5 and morphological features of L. sibirica can be found in populations of L. gmelinii located at least a hundred kilometers east of the closest L. sibirica populations49. Macrofossil findings of L. sibirica in Scandinavia dated to the early Holocene, point to the capability of rapid long-distance jump dispersal of this species50. Fossil L. sibirica cones dated to the end of the Pliocene and in the Pleistocene have also been found far east of its current range in several river valleys including Kolyma, Aldan, and Omolon, and even in the basin of the Sea of Okhotsk9. These indicate long-distance seed dispersal by rivers which may also have assisted in successful establishment since the active-layer depth is deeper close to rivers51,52. As mentioned earlier, L. sibirica is sensitive to permafrost and waterlogged soils. A warmer phase with a deeper thawed layer above the permafrost could have enabled L. sibirica to spread and establish in regions that today are part of the geographic range of L. gmelinii, as L. sibirica is reported to have higher growth rates than L. gmelinii13.
Larix gmelinii formed northern LGM refugia across Siberia
The possible survival of Larix in high latitude glacial refugia during the LGM is still under discussion4,53 although more and more evidence is reported in favor of the existence of such refugia17,20,21. The question of which of the Larix species formed these populations has hitherto been unanswered, as both pollen and established metabarcoding markers are not able to distinguish between species in the genus Larix, and findings of fossilized cones identifiable to species are rare. By enriching sedaDNA extracts for chloroplast genome sequences, we are, to the best of our knowledge, for the first time, able to distinguish between L. sibirica and L. gmelinii in glacial refugial populations.
From Lake Lama, located at the western margin of the Putorana Plateau (Taymyr Peninsula), we obtained a continuous record extending from 23 ka BP to today with varying sequence counts with minima around 18–17 ka BP and 13 ka BP. All samples prior to the Holocene show variations predominantly assigned to L. gmelinii (Fig. 3). Our results suggest a local survival of L. gmelinii in the vicinity of Lake Lama throughout the LGM, which is supported by low numbers of Larix pollen detected through this period. Both target enriched sequence data and pollen indicate an increase from ca. 11 ka BP54. Sparse Larix pollen in the bottom part of the record could be an indication of a possible refugial population (Fig. 2; ref. 54).
In Bolshoye Shchuchye, the westernmost lake of the study, situated in the Polar Ural Mountains, all Pleistocene samples show similarly a dominance of L. gmelinii sequence variations (Fig. 3). However, sequence counts for some samples are extremely low and samples from 18 and 10 ka BP had so low counts of mapped DNA sequences that none of the variable positions between the species was covered. Although sequences mapped to the satellite repeat of Larix also showed a Pleistocene signal, this was not repeated in pollen or metabarcoding (Fig. 2) which instead indicates a treeless arctic-alpine flora for the late Pleistocene55,56. Especially for the sample of 20.4 ka, Larix sequence counts are extremely low and new investigations would be needed to confirm a local presence of Larix during the LGM.
The record of Lake Billyakh situated in the western Verkhoyansk Mountain Range, likewise shows extremely low counts of sequences mapped to the reference for a range of samples with no sequences covering the studied variable sites (45, 42, and 15 ka BP, 11–56 sequences mapped to non-variable sites). However, the pollen record for the same core shows a quasi-continuous record of Larix with a gap only occurring during the early LGM46 (25–22 ka BP, Fig. 2). Considering the known short-distance dispersal ability and poor preservation of Larix pollen, this strongly supports the presumed existence of a local glacial refugium at Lake Billyakh during that time20. Our samples also show a low but steady presence of Larix throughout the rest of the record, thus making glacial survival probable. The sample closest to the LGM (24 ka BP) indicates a clear dominance of L. gmelinii type variations.
The only exception to this general pattern is the record from Lake Kyutyunda, which is located on the Central Siberian Plateau west of the Verkhoyansk Mountain Range. In this record, LGM samples have extremely low counts but show variations assigned to L. sibirica and not to L. gmelinii as in the other lakes. In addition, the preceding sample dated to the MIS3 interstadial shows L. sibirica variation. A possible explanation is that relics of L. sibirica survived during the LGM, but were unable to spread after climate warming, possibly due to genetic depletion or later local extinction. The presence of reworked sediment material can also not be excluded, as suggested by reworked pollen in the record57.
In conclusion, our data show almost exclusively L. gmelinii variation for samples covering the most severe LGM climate conditions. This is in agreement with the ecological characteristics describing the species as adapted to extreme cold. In contrast to L. sibirica, it can grow in dwarf forms and propagate clonally and potentially survive thousands of years of adverse climatic conditions58.
Postglacial colonization history—differences among larch species
Of great interest in the Larix history is not only the location and extent of possible high latitude glacial refugia but also if and to what extent these refugia contributed to the recolonization of Siberia after the LGM. Northern refugial populations could have functioned as kernels of postglacial population spread and recolonization, or spreading could have been driven by populations that survived in southern refugia. There are only a few studies on modern populations that report evidence for possible recolonization scenarios of Larix23,27,28. Here, we show that patterns differ between L. sibirica and L. gmelinii.
In the western part of our study region, two lakes are situated in the current distribution range of L. sibirica (Figs. 1, 4): Lake Bolshoye Shchuchye in the Polar Ural Mountains and Lake Lama on the Taymyr Peninsula. Despite this, both lakes show L. gmelinii for all Pleistocene samples, and a strong signal of L. sibirica variants only in the Holocene samples, with ages of 5.1 ka BP in Lake Bolshoye Shchuchye and 9.7 ka BP in Lake Lama (Fig. 3). The peak in L. sibirica also coincides with a peak of sequence counts in the respective sample, with a Larix pollen peak in Lake Lama sediments54, and metabarcoding for Lake Bolshoye Shchuchye55. This points to a migration of L. sibirica in its current northern area of northern distribution in the course of climate warming during the early Holocene, whereas glacial refugial populations were consisting of L. gmelinii. Although the local survival of L. gmelinii around Lake Bolshoye Shchuchye remains uncertain due to extremely low sequence counts, it is clear that L. sibirica did not form a refugial population at this site.
A range-wide genetic study of L. sibirica analyzing chlorotypes and mitotypes of individuals23 found strong indications for rapid colonization of the West Siberian Plains from populations originating from the foothills of the Sayan Mountains in the south, close to the border of Mongolia, with only limited contribution from local populations. According to our results, the local populations could have been L. gmelinii populations, while the rapid invasion could have been L. sibirica.
In the eastern range of the study region, in the current range of L. gmelinii, namely at lakes Emanda, Satagay, and Malaya Chabyda, genetic variations throughout the records are less pronounced. Of the three lake records, only that from Lake Emanda reaches back beyond the LGM, but with a sampling gap for the time of the LGM. Therefore, it remains uncertain whether populations survived the LGM locally, or whether they were invaded or replaced by populations coming from the south with Holocene warming. The restricted variations throughout the record, however, hint at stable populations, which is supported by scarce pollen findings (Fig. 2).
Our data suggest that postglacial recolonization of L. sibirica was not started from high latitude glacial refugia, but from southern populations. In contrast, northern glacial populations of L. gmelinii could have potentially enhanced rapid dispersal after the LGM in their current area of distribution.
Environment likely plays a more important role than historical factors
The current boundaries of boreal Larix species arranged from west to east suggest a possible strong influence of the historical species distribution on the current distribution, whereas the gradient of increasing continental climate towards the east assumes a strong influence on the environment. By tracking species distribution in the past, spanning the time of the strongly adverse climate of the LGM, we can give hitherto unprecedented insights into species distribution history.
Several lines of evidence suggest a strong influence of the environment on species distribution: (1) Signals for L. sibirica appeared in its current area of distribution as late as the Holocene warming, whereas cold Pleistocene samples are dominated by L. gmelinii type variation; (2) in lakes far east of its modern range, signals of variation typical for L. sibirica coincide with peaks in sequence counts (29 ka BP, Lake Billyakh; 32 ka BP Lake Kyutyunda), which point to more forested vegetation around the lakes and consequently a more favorable climate at that time; and (3) samples dated to the severely cold LGM are dominated by variations of the L. gmelinii type.
This is in accordance with the different ecological characteristics described for the species. L. sibirica is sensitive to permafrost and only occurs outside of the zone of continuous permafrost5. In addition, L. sibirica achieves substantially higher growth rates and longer growth periods than L. gmelinii9,13 and can also produce more than twice as many seeds5. This potentially gives L. sibirica the ability to quickly react to climate change and outcompete the other species when the climate becomes more favorable.
In contrast, L. gmelinii is adapted to extremely low soil and air temperatures and is able to grow on permafrost with very shallow thaw depths. It’s distribution almost completely coincides with continuous permafrost5, and even a restriction to permafrost areas is discussed as it does not grow well in field trials on warmer soils or where there is a small temperature gradient between air and soil9. Due to this ecology, L. gmelinii is more likely to survive in a high latitude refugium, even during the severe continental climate of the LGM, which was most probably connected to continuous permafrost of low active-layer depths.
A study combining mitochondrial barcoding on sedaDNA and a modeling approach on Larix distribution in the Taymyr region around Lake CH12 concluded that the distributions of L. gmelinii and L. sibirica are most strongly influenced by stand density and thus by competition between the species, with L. gmelinii outcompeting L. sibirica at high stand densities13. As our study includes sediment cores reaching further back in time, we see a different trend. Instead of L. gmelinii, it was L. sibirica, which dominated samples with high sequence counts, suggesting high stand density and a more favorable climate. A possible explanation for the different outcomes is the use of different organelle genomes. Epp et al.13 used a marker representing the mitochondrial genome, which is known to introgress more rapidly and as a consequence might show a long past species history59,60.
Our findings have potentially important implications for the projections of vegetation-climate feedback. A warming climate in conjunction with a greater permafrost thaw depth could enable the replacement of L. gmelinii by L. sibirica. In contrast to L. gmelinii, L. sibirica is not known to stabilize permafrost thus potentially further promoting permafrost thaw and with it the release of greenhouse gases, creating positive feedback on global warming11. On the other hand, the substantially higher growth rates of L. sibirica in comparison to L. gmelinii would increase carbon sequestration, thus mitigating global warming13. This shows the importance of understanding species-specific reactions to climate change, which can result in great shifts in distribution. Target enrichment applied on sedaDNA is able to reveal the impact of past climate change on populations and the increasing availability of modern reference genomes will further enhance its value of information.
Source: Ecology - nature.com