Selective enrichment and metagenomic analysis of three novel comammox Nitrospira in a urine-fed membrane bioreactor

Bioreactor operation and sampling

A continuous-flow MBR made from Plexiglass with a working volume of 12 L was used for enrichment (Supplementary Fig. S1). The reactor was installed with a submerged hollow fiber ultrafiltration membrane module (0.02 μm pore size, Litree, China) with a total membrane surface area of 0.03 m². A level control system was set up to prevent liquid overflowing. The reactor was fed with diluted real urine with Total Kjeldahl Nitrogen (TKN) concentration of 140–405 mg N L⁻¹ (for detailed influent composition see Supplementary Table S1). Initially, the reactor was inoculated with activated sludge taken from the aeration tank of a municipal wastewater treatment plant (Tsinghua Campus Water Reuse). The pH was maintained at 6.0 ± 0.1 by adding 1 M NaOH to buffer acidification by ammonia oxidation. The airflow was controlled at 2 L min⁻¹, leading to the dissolved oxygen (DO) concentration above 4 mg O₂ L⁻¹ as regularly measured by a DO probe (WTW Multi 3420). The airflow also served to wash the membrane and mix the liquid. The temperature was controlled at 22–25 °C. The initial hydraulic retention time (HRT) was 3 days and was decreased to 1.5 days on day 222. The sludge retention time (SRT) was infinite as no biomass was discharged.

The MBR was operated for 490 days. During this period, influent and effluent samples (10 mL each) were collected 1–3 times per week and used to determine the concentrations of TKN, total nitrite nitrogen (TNN = NO₂⁻-N + HNO₂-N), and nitrate nitrogen, according to standard methods.¹⁹ Mixed liquid samples (25 mL) were also taken weekly to measure mixed-liquor suspended solids (MLSS) and mixed-liquor volatile suspended solids (MLVSS).¹⁹ Biomass samples (10 mL) were regularly taken for qPCR and microbial community analyses (see below).

Batch tests

In order to test urea hydrolysis and subsequent nitrification in the enrichment culture, short-term incubations were performed in a cylindrical batch reactor (8 ×18.5 cm [d × h], made from Plexiglass). 150 mL biomass was sampled from the reactor and washed three times in 1 x PBS buffer to remove any remaining nitrogen source. Subsequently, the biomass was resuspended in a 400 mL growth medium, which contained urea (about 40 mg N L⁻¹), NaHCO₃ (120 mg L⁻¹), and 2 mL Hunter’s trace elements stock. Dissolved oxygen was controlled above 4 mg O₂ L⁻¹. Biotic and abiotic controls were performed under identical conditions with NH₄Cl (~40 mg N L⁻¹) instead of urea. The pH in all batch assays was maintained at 6.0 ± 0.1 by adding 1 M HCl or NaOH. According to the microbial activities during long-term operation, each batch assay lasted 6 to 8 h, and samples (5 mL) were taken every 20 to 60 min. Biomass was removed by sterile syringe filter (0.45 μm pore size, JINTENG, China), and urea, ammonium, nitrite, and nitrate concentrations were determined as described above. All experiments were performed in triplicate.

DNA extraction

Biomass (2 mL) for DNA extraction was collected on days 0, 53, 98, 131, 161, 189, 210, 238, 266, 301, 321, 358, 378, 449, and 471. DNA was extracted using the FastDNA™ SPIN Kit for Soil (MP Biomedicals, CA, U.S.) according to the manufacturer’s protocols. DNA purity and concentration were examined using agarose gel electrophoresis and spectrophotometrically on a NanoDrop 2000 (ThermoFisher Scientific, Waltham, MA, USA).

16S rRNA gene amplicon sequencing and data analysis

The V4-V5 region of the 16 S rRNA gene was amplified using the universal primers 515F (5′-barcode-GTGCCAGCMGCCGCGG-3′) and 907 R (5′-CCGTCAATTCMTTTRAGTTT-3′).²⁰ PCR products were purified using the AxyPrep DNA Gel Extraction Kit (Axygen Biosciences, Union City, CA, USA) according to manufacturer’s instructions and quantified using the QuantiFluor™ -ST (Promega, USA). Amplicons were pooled in equimolar concentrations and sequenced using the Illumina MiSeq PE3000 sequencer as per the manufacturer’s protocol. Amplicon sequences were demultiplexed and quality filtered using QIIME (version 1.9.1).²¹ Reads <50 bp were discarded and all remaining paired-end reads with an overlap >10 bp were assembled. UPARSE (version 7.0.1090 http://drive5.com/uparse/) was used to cluster operational units (OTUs) on a 97% similarity cut-off level, and UCHIME to identify and remove chimeric sequences. The taxonomy of each 16S rRNA gene sequence was assigned by the RDP Classifier algorithm (http://rdp.cme.msu.edu/) according to the SILVA (SSU132) 16S rRNA database using a confidence threshold of 70%.

Quantification of various amoA by qPCR

To quantify the abundances of comammox Nitrospira, AOB and AOA in the bioreactor, qPCR targeting the functional marker gene amoA was performed on DNA extracted from the bioreactor at different time points. We used the specific primers Ntsp-amoA 162F/359R amplifying comammox Nitrospira clades A and clade B simultaneously,¹² Arch-amoAF/amoAR targeting AOA amoA,²² and amoA-1F/amoA-2R for AOB amoA.²³ Reactions were conducted on a Bori 9600plus fluorescence quantitative PCR instrument using previously reported thermal profiles (Supplementary Table S2). Triplicate PCR assays were performed the appropriately diluted samples (10–30 ng μL⁻¹) and 10-fold serially diluted plasmid standards as described by Guo et al.²⁴. Plasmid standards containing the different amoA variants were obtained by TA-cloning with subsequent plasmid DNA extraction using the Easy Pure Plasmid MiniPrep Kit (TransGen Biotech, China). Standard curves covered three to eight orders of magnitude with R² greater than 0.999. The efficiency of qPCR was about 95%.

Library construction and metagenomic sequencing

The extracted DNA was fragmented to an average size of about 400 bp using Covaris M220 (Gene Company Limited, China) for paired-end library construction. A paired-end library was constructed using NEXTFLEX Rapid DNA-Seq (Bioo Scientific, Austin, TX, USA). Adapters containing the full complement of sequencing primer hybridization sites were ligated to the blunt-end of fragments. Paired-end sequencing was performed on Illumina NovaSeq PE150 (Illumina Inc., San Diego, CA, USA) at Majorbio Bio-Pharm Technology Co., Ltd. (Shanghai, China) using NovaSeq Reagent Kits according to the manufacturer’s instructions (www.illumina.com).

Metagenomic assembly and genome binning

Raw metagenomic sequencing reads (in PE150 mode) were trimmed and quality filtered with in-house Perl scripts as described previously.²⁵ Briefly, duplicated reads caused by the PCR bias during the amplification step were dereplicated. Reads were eliminated if both paired-end reads contained >10% ambiguous bases (that is, “N”). Low-quality bases with phred values <20 at both sides were trimmed. This resulted in on average 57 million high-quality pair-ended reads for each dataset. Quality-filtered reads from each sample were assembled separately using SPAdes v3.9.0²⁶ with the following parameters: –meta –k 33,55,77,99,127. Contigs >2.5 kbp were retained for later analysis. Genome binning was conducted for each sample using sequencing depth and tetranucleotide frequency. To calculate coverage, high-quality reads from all samples were mapped to the contigs using BBMap v38.85 (http://sourceforge.net/projects/bbmap/) with minimal identity set to 90%. The generated bam files were sorted using samtools v1.3.1.²⁷ Then, sequencing depth was calculated using the script “jgi_summarize_bam_contig_depths” in MetaBAT.²⁸ Metagenome-assembled genomes (MAGs) were obtained in MetaBAT. MAG quality, including completeness, contamination, and heterogeneity, was estimated using CheckM v1.0.12.²⁹ To optimize the MAGs, emergent self-organizing maps³⁰ were used to visualize the bins, and contigs with abnormal coverage or discordant tetranucleotide frequencies were removed manually. Finally, all MAGs were reassembled using SPAdes with the following parameters: –careful –k 21,33,55,77,99,127. The reads used for reassembly were recruited by mapping all high-quality reads to each MAG using BBMap with the same parameter settings as described above.

Functional annotation of metagenomic assemblies and metagenome-assembled genomes

Gene calling was conducted for the complete metagenomic assemblies and all retrieved MAGs using Prodigal v2.6.3.³¹ For the MAGs, predicted protein-coding sequences (CDSs) were subsequently aligned to a manually curated database containing amoCAB, hao, and nxrAB genes collected from public database using DIAMOND v0.7.9 (E-values < 1e−5 ³²) MAGs found to contain all these genes were labeled as comammox Nitrospira MAGs and kept for later analysis. Functional annotations were obtained by searching all CDSs in the complete metagenomic assemblies and the retrieved MAGs against the NCBI-nr, eggNOG, and KEGG databases using DIAMOND (E-values < 1e−5).

Phylogenetic analysis

Phylogenomic tree

The taxonomic assignment of the three identified comammox Nitrospira MAGs was determined using GTDB-tk v0.2.2.³³ To reveal the phylogenetic placement of these MAGs within the Nitrospirae, 296 genomes from this phylum were downloaded from the NCBI-RefSeq database. The download genomes were dereplicated using dRep v2.3.2³⁴ (-con 10 -comp 80) to reduce the complexity and redundancy of the phylogenetic tree, which resulted in the removal of 166 genomes. In the remaining genomes, the three comammox Nitrospira MAGs and 25 genomes from phylum Thermotogae which were treated as outgroups, a set of 16 ribosomal proteins were identified using AMPHORA2.³⁵ Each gene set was aligned separately using MUSCLE v3.8.31 with default parameters,³⁶ and poorly aligned regions were filtered by TrimAl v1.4.rev22 (-gt 0.95 –cons 50³⁷) The individual alignments of the 16 marker genes were concatenated, resulting in an alignment containing 118 species and 2665 amino acid positions. Subsequently, the best phylogenetic model LG + F + R8 was determined using ModelFinder³⁸ integrated into IQ-tree v1.6.10.³⁹ Finally, a phylogenetic tree was reconstructed using IQ-tree with the following options: -bb 1000 –alrt 1000. The generated tree in newick format was visualized by iTOL v3.⁴⁰

amoA tree

Reference amoA sequences of AOB, AOA, and comammox Nitrospira were obtained from NCBI. Together with the amoA genes from the present study, all sequences were aligned and trimmed as described above. IQ-tree was used to generate the phylogenetic tree, with “LG + G4” determined as the best model.

ureABC gene tree

ureABC gene sequences detected in this study were extracted and used to build a database using “hmmbuild” command in HMMER.⁴¹ ureABC gene sequences from genomes in NCBI-RefSeq database (downloaded on July 1st, 2019) were identified by searching against the built database using AMPHORA2. The same procedures as above were conducted to construct the phylogenetic tree of concatenated ureABC genes, except for the sequence collection step. To reduce the complexity of the phylogenetic tree, the alignment of concatenated ureABC genes was clustered using CD-HIT⁴² with the following parameters: -aS 1 -c 0.8 -g 1. Only representative sequences were kept for phylogeny reconstruction, which resulted in an alignment containing 858 sequences and 1263 amino acids positions. “LG + R10” was determined as the best model and used to build the phylogenetic tree. Regarding the Nitrospirae-specific ureABC gene tree, ureABC gene sequences were recruited from the genomes as described above, but without the sequence clustering step. The final Nitrospirae-specific phylogeny of ureABC genes was built on an alignment containing 62 sequences and 1015 amino acid positions with “LG + F + I + G4” as the best model.

Source: Ecology - nature.com