Study sites and sampling
We sampled gas and soil in 29 regions throughout the A (rainy tropical), C (temperate), and D (boreal) climate types of the Köppen classification from six continents during the vegetation period between August 2011 and June 2018, following a standard protocol26. According to the protocol, the gas and soil samples were collected from locations in public domain or in previous agreement with the local community and/or property owner. The samples were exported from the origin countries and imported to Estonia, EU in cooperation with customs officers of the respective states, following the legal provisions of soil export and import, specifically exemptions for scientific purposes. To capture the full range of environmental conditions in each region, we established 76 wetland soil sites under different vegetation (mosses, sedges, grasses, herbs, trees, and bare soil) and land-use types (natural—raised bog, fen, and forest; agricultural—arable, hay field and pasture; and a peat extraction area) (Fig. 1a; Supplementary Data 1). We used a four-grade land-use intensity index to quantify the effect of land conversion: 0, no agricultural land use (natural mire, swamp, or bog forest); 1, moderate grazing or mowing (once a year or less); 2, intensive grazing or mowing (more than once a year); and 3, arable land (directly fertilized or unfertilized). The vegetation and land-use intensity types and the land-use intensity index were estimated from observations and contacts with site managers and local researchers.
Within the sites, we established 1–4 stations 15–500 m apart to maximize the captured environmental variation. Each of the 196 stations were equipped with 3–5 opaque PVC 65 L truncated conical chambers 1.5–5 m apart and an observation well (perforated, 50 mm diameter PP-HT pipe wrapped in geotextile; 1 m in length). From each of the 645 chambers, N2O fluxes were measured following the static chamber method37 using PVC collars (0.5 m diameter, installed to 0.1 m depth in soil). Stabilization of 3–12 h was allowed before gas sampling to reduce the disturbance effect of inserting the collars on fluxes. The chambers were placed into water-filled rings on top of the collars. Gases were sampled from the chamber headspace into a 50 mL glass vial every 20 min during a 1-h session. The vials had been evacuated in the laboratory 2–6 days before the sampling. At least three sampling sessions per location were run within 3 days. Water-table height was recorded from the observation wells during the gas sampling at least 8 h after placement. Soil temperature was measured at the 10 and 20 cm depth.
Soil samples of 150–200 g were collected from the chambers at 0–10 cm depth after the final gas sampling, and transported to laboratories in Tartu, Estonia. The homogenized samples were divided into subsamples for physical–chemical analyses and DNA extraction. The samples for chemical analyses were stored at 4 °C and microbiological samples were stored at –20 °C. DNA extraction was provided at the Tartu University environmental microbiology laboratory (see details below). Using a PP-HT plastic cylinder, intact soil cores (diameter 6.8 cm, height 6 cm) for the N2 analysis with the He–O2 method38 were collected from the topsoil (0−10 cm) inside 252 chambers at 26 sites, starting from 2014. Samples from different climates were run at respective temperatures. During transport, the soil samples were kept below the ambient soil temperature from which they were collected.
Gas flux analyses
The gas samples were analyzed for N2O concentration within 2 weeks using two Shimadzu GC-2014 gas chromatographs equipped with ECD, TCD, and a Loftfield-type autosampler. The N2O fluxes were determined on linear regressions obtained from consecutive N2O concentrations taken when the chamber was closed, using p < 0.05 for the goodness of fit as a quality threshold for the linear regression. During the quality control, in cases of insignificant regression (p > 0.05 we removed one outlier. If the regression remained insignificant but the flux value fell below the gas-chromatography measuring accuracy (regression change of N2O concentration δv < 10 ppb), we included it in the subsequent analyses as a zero value.
The helium atmosphere soil incubation technique30 was used to measure potential N2 fluxes from soil cores. The cylinders with intact soil cores were placed into special gas-tight incubation vessels located in a climate chamber. Gases were removed by flushing with an artificial gas mixture (21.0% O2, 358 ppm CO2, 0.313 ppm N2O, 1.67 ppm CH4, 5.97 ppm N2, and He). The new atmosphere equilibrium was established after 12–24 h by continuously flushing the vessel headspace with the artificial gas mixture at 20 mL/min. The flushing time depended on the soil moisture. Incubation temperature was kept similar with the field conditions. The gas-chromatograph (Shimadzu GC-2014) equipped with a thermal conductivity detector was used to measure N2 concentration in the mixture of emitted gases accumulated in the headspace (start value, 40, 80, and 120 min as final value) of the cylinder after 2 h of closure. The gas concentration in the chambers increased in a near-linear fashion and linear regression was applied for calculation of the fluxes. The flux measurements with r2 of 0.81 (p < 0.1) or greater were used.
Soil physico-chemical analysis
Plant-available phosphorus (P, NH4-lactate extractable) was determined on a FiaStar5000 flow-injection analyzer. Plant-available potassium (K) was determined from the same solution by the flame-photometric method and plant-available magnesium (Mg) was determined from a 100 mL NH4-acetate solution with a titanium-yellow reagent on the flow-injection analyzer. Plant-available calcium (Ca) was analyzed using the same solution by a flame-photometrical method. Soil pH was determined using a 1 N KCl solution; soil NH4−N and NO3−N were determined on a 2 M KCl extract of soil by flow-injection analysis (APHA-AWWA-WEF, 2005). Total nitrogen and carbon contents of oven-dry samples were determined by a dry-combustion method on a varioMAX CNS elemental analyzer (Elementar Analysensysteme GmbH, Germany). Organic matter content of dry matter was determined by loss on ignition. We determined soil water content (SWC) from dry matter content and empirically established bulk densities of mineral and organic matter fractions.
DNA extraction, DNA library preparation, and sequencing
DNA extraction was performed from 0.2 g of frozen soil samples (homogenized) using the Qiagen DNeasy PowerSoil Kit (12888-100), following the manufacturer’s recommendations. DNA concentrations were measured with Qubit™ 1X dsDNA HS Assay Kit using Qubit 3 fluorometer (Invitrogen). Altogether 645 individual soil samples were selected for metabarcoding of bacteria, archaea, and eukaryotes. For bacteria, we used the primers 515F (5′-GTGYCAGCMGCCGCGGTAA-3′) and 806RB (5′-GGACTACNVGGGTWTCTAAT-3′) to amplify the variable V4 region of the 16S rRNA gene39. Although these primers co-amplify archaea to some extent, we sought to specifically amplify a longer portion of their 16S rRNA gene to capture their full diversity, using the primers SSU1ArF (5′-TCCGGTTGATCCYGCBRG-3′) and SSU1000ArR (5′-GGCCATGCAMYWCCTCTC-3′)40. To amplify a broad range of eukaryotes, we used the primers ITS9mun (5′-GTACACACCGCCCGTCG-3′) and ITS4ngsUni (5′-CGCCTSCSCTTANTDATATGC-3′) that cover the V9 variable region of the 18S rRNA gene and the full internal transcribed spacer (ITS) region41. Both the forward and reverse primers were tagged with a 12-base multiplex identifier (MID), except in the case of archaea where only the forward primer was tagged with MID. All PCRs were performed in two replicates using 5 × HOT FIREPol® Blend Master Mix (Solis BioDyne, Tartu, Estonia) in 25 μl volume. By default, the bacteria, archaea, and eukaryotes were amplified using 25, 35, and 30 cycles, respectively. In case of no amplification, two or five extra cycles were added, or DNA was re-extracted and re-purified. Thermal cycling included an initial denaturation at 95 °C for 15 min; 25–40 cycles of denaturation for 30 s at 95 °C, annealing for 30 s at 55 °C, elongation for 1 min at 72 °C; final elongation at 72 °C for 10 min; and storage at 4 °C. The two replicates of each reaction were pooled and visualized on TBE 1% agarose gel.
The bacterial amplicons were sequenced using the Illumina NovaSeq platform at 2 × 250 bp paired-end mode. Illumina amplicon libraries were generated using TruSeq DNA PCR-Free High Throughput Library Prep Kit with TruSeq DNA CD Indexes (Illumina). To increase identification accuracy and coverage, the archaeal and eukaryote amplicons were sequenced using a long-read sequencing technology on PacBio Sequel II platform40,41. SMRTbell library preparation followed the Pacific Biosciences Amplicon library preparation protocol. Metabarcoding analysis was repeated for samples yielding <5000 prokaryotic reads (Illumina), <500 archaeal reads (PacBio), or <1000 eukaryote reads (PacBio).
For the functional metagenome analysis, three replicate soil samples per station were pooled based on equimolar amount of DNA. Library preparation and indexing of each 196 pooled samples was performed using Nextera XT DNA Library Prep Kit in combination with Nextera XT Index kits v2 (Illumina). Metagenomes were sequenced based on the shotgun approach to an expected depth of 5,000,000 reads using Illumina NovaSeq with 2 × 150 bp paired-end mode. The samples with <1,000,000 reads were subjected to resequencing.
Quantitative PCR
We used qPCR to quantity the absolute abundance of bacterial and archaeal 16S rRNA genes as well as the key genes involved in N cycle pathways, including denitrification (nirS, nirK, nosZ clade I, and nosZ clade II), N fixation (nifH), dissimilatory nitrate reduction to ammonia (DNRA; nrfA), ammonia oxidation (bacterial amoA, archaeal amoA, comammox amoA), and anammox- and n-damo-specific 16S rRNA genes (Supplementary Fig. 7). The qPCR assays were performed using RotorGene® Q equipment (Qiagen, Valencia, CA, USA). The qPCR method was performed following ref. 34. Briefly, the qPCR reactions were performed in 10 μL volume containing 5 μL Maxima SYBR Green Master Mix (Thermo Fisher Scientific Inc., Waltham, MA, USA), an optimized concentration of forward and reverse primers, 1 μL of template DNA and sterile distilled water. The gene-specific primer sets, optimized primer concentrations and thermal cycling conditions for each target gene are shown in Supplementary Data 8. The quantification data were analyzed with RotorGene Series Software (version 2.0.2; Qiagen, Hilden, Germany) and LinRegPCR program (version 2020.0)42. The gene abundances were calculated as a mean of fold differences between a sample and each 10-fold standard dilution in respective standard as recommended by ref. 42; gene abundances were reported as gene copy numbers per gram of dry soil.
Bioinformatics
Metabarcoding
Illumina MiSeq sequences were analyzed using LotuS software43 following ref. 44. Briefly, the reads were demultiplexed and quality-filtered by trimming individual reads to 170 bp and removing reads with an accumulated error >2 or an estimated accumulated error >2.5 at a probability of ≥0.01. To pass to the next step, each unique read (reads preclustered at 100% identity) was required to be present at least eight times in at least one sample, four or more times in at least two samples, or three or more times in at least three samples. Chimeric OTUs were removed based on both reference-based and de novo chimera checking algorithms as implemented in uchime45. The resulting OTUs were taxonomically annotated by aligning their sequences with Lambda46 to SILVA v135 database47 and the LotuS least common ancestor (LCA) algorithm (options: -p miSeq derepMin 8:1,4:2,3:3 –simBasedTaxo 2 –refDB SLV -thr 8). For processing PacBio sequencing data, PipeCraft48 was used as follows. Raw sequencing data was demultiplexed via mothur (version 1.36.1)49 module in PipeCraft by allowing one mismatch to tag region (i.e. to index sequence that was used for multiplexing); quality filtering was performed using vsearch (version 1.11.1)50 module with maximum expected error threshold of 1 (–fastq_maxee = 1) and discarding sequences with ambiguous bases (–fastq_maxns = 0); putative chimeric reads were discarded using vsearch uchime_denovo algorithm; prior clustering, full length ITS reads without conservative regions (18S and 28S rRNA genes; i.e. primer-binding sites) were extracted using ITSx software (version 1.0.11)51; using UPARSE (version 8.1.1861), sequences were clustered to OTUs at 98% sequence similarity where singletons (clusters with only one sequence) were removed during the process (minsize = 2). Representative sequences of OTUs were taxonomically annotated based on the best blast hit against UNITE database (version 8)52 followed by the LCA algorithm. For statistical analyses, we retained 645, 440, and 638 samples that yielded sufficient sequencing depth for Illumina 16S data (bacteria and archaea), PacBio 16S data (archaea) and PacBio ITS (fungi), respectively.
Metagenomics
Analysis of metagenomic reads was done using MATAFILER pipeline53. Briefly, reads obtained from the shotgun metagenomic sequencing of peat samples were quality-filtered by removing reads shorter than 70% of the maximum expected read length (150 bp), with an observed accumulated error >2 or an estimated accumulated error >2.5 with a probability of ≥0.01, or >1 ambiguous position. Using sdm software (version 1.46)43, reads were trimmed if base quality dropped below 20 in a window of 15 bases at the 3′ end, or if the accumulated error exceeded 2. Altogether 196 samples produced sufficient quantity of reads and were retained for statistical analyses. To estimate the functional composition of each sample, we implemented a similarity search approach using DIAMOND (version 2.0.5; options -k 5 -e 1e-4 –sensitive) in blastx mode54. Prior to that, the quality-filtered read pairs were merged using FLASH (version 1.2.10)55. The mapping scores of two unmerged query reads that mapped to the same target were combined to avoid double counting. In these cases, the hit scores were combined by averaging the percent identity of both hits. The best hit for a given query was based on the highest bit score and highest percent identity to the subject sequence. Using this method, we calculated the relative abundance of (clusters of) orthologous gene groups (OG) by mapping quality-filtered reads against the eggnog database (version 4)56. We also calculated metagenomic relative abundances (i.e. miTag57) of different taxonomic groups based on small subunit (SSU) rRNA genes. For this, SortMeRNA (version 2.0)58 was used to extract and blast search rRNA genes against the SILVA SSU database (v128). Reads approximately matching this database with e < 10−4 were further filtered with custom Perl and C++ scripts, and merged using FLASH. In case read pairs could not be merged, the reads were interleaved such that the second read pair was reverse complemented and then sequentially added to the first read. Of these preselected reads, 50,000 reads were fine-matched the Silva SSU database using Lambda and the lowest common ancestor (LCA) algorithm adapted from LotuS.
Genomic analysis
The taxonomic analysis revealed that a few microbial lineages may disproportionally outperform the community functional diversity of microbes in affecting ecosystem biogeochemistry. Thus, we followed a trait-based approach to confirm our findings. We downloaded 385 complete archaeal genomes from NCBI as of 10/7/2020 (search terms: archaea[Organism] AND “complete genome”). These were used to build a reference database for a BlastN search to identify corresponding genomes and functional annotations of our archaeal OTUs. In addition, to better understand the potential functions of different archaeal lineages in N cycling, the functional annotation of all available archaeal genomes was retrieved from the Integrated Microbial Genomes and Microbiomes database (img.jgi.doe.gov) as of 15/7/2020.
Data analysis
To account for differences in sequencing depth across samples, diversity indices (Shannon diversity index) were calculated based on rarefied abundance matrices (metabarcoding datasets) in vegan package59 of R (version 2.5-6). Multivariate analyses were performed using Bray-Curtis dissimilarity on normalized taxa abundance matrices in vegan. All raw P-values of multiple tests were corrected using Benjamini–Hochberg method. Taxonomic abundance data were normalized using Hellinger transformation as implemented in vegan.
To test the effect of biotic variables on N2O emissions, we used Spearman correlation analysis components to identify the bacterial and archaeal taxonomic lineages and fungal OTUs most strongly associated with N2O emissions. Functional gene (OG) composition and taxonomic community matrices were normalized by library size using Hellinger transformation. We subsequently used partial least squares (PLS) analysis to predict N2O emissions based on taxonomic groups, which allows the dimensionality of multivariate data to be reduced into PLS components using plsdepot package60 of R (version 0.1.17). Prior to this, we performed a backward variable elimination procedure to remove variables with low explanatory power (VIP threshold < 1), as implemented in plsVarSel package61 of R (version 0.9.6).
For univariate analysis, the best predictors of the diversity and relative abundances of taxonomic and functional groups were identified using a machine learning approach implemented in randomForest package62 of R (version 4.6-14). To further test direct and indirect effects of variables in the best model, structural equation modeling (SEM) was used as implemented in piecewiseSEM package63 of R (version 2.1.0). The prior model was constructed based on our hypothesis (see the section “Introduction”). The optimal model fit was achieved by subsequent iterative revision based on modification indices. For linear relationships, Spearman’s rank‐correlation coefficient was calculated in R. For fitting non-linear relationships between soil water content and N2O emissions, generalized additive model (GAM) were constructed using smoothing parameter estimated by marginal likelihood (REML) maximization, as implemented in mgcv package64 of R (version 1.8-33). We also compared the goodness of fit estimates between first and second order polynomial models for certain analyses. The best polynomial fit was determined on the basis of Akaike Information Criterion (AIC) scores using “AIC” and “poly” functions of R.
Source: Ecology - nature.com