Unique mobile elements and scalable gene flow at the prokaryote–eukaryote boundary revealed by circularized Asgard archaea genomes
Hydrothermal vent rock and sediment sample collectionRock no. NA091-R045 (source of Ca. H. endolithica PR6, Ca. H. repetitus FW102 and Thorarchaeote FW25) and rock no. NA091-R008 (source of Heimdall group Gerdarchaeote AC18) were retrieved from the Auka hydrothermal vent site situated on the margin of the southern Pescadero Basin of the Gulf of California using remotely operated vehicle Hercules during research expedition NA091 on E/V Nautilus on 2 November 2017. Local venting fluids have a measured temperature approaching 300 °C, contain hydrocarbons and hydrogen and are precipitating minerals, such as calcite and barite15. R045 was collected during dive H1658 at coordinates 23.956987786° N, 108.86227922° W at a water depth of 3,674 m, near shimmering water, a sign of locally focused hydrothermal fluid discharge. R008 was collected during dive H1657 at coordinates 23° 57′ N, 108° 52′ W at a water depth of 3,651 m. After shipboard recovery, rock samples were placed in Mylar bags prefilled with 0.2 µm filtered bottom seawater collected during the same dive, flushed with N2 gas for 10 min, sealed and stored at 4 °C until preparation for incubations in the laboratory.Sediment sample no. FK181031-S0193-PC3 (source of Ca. H. aukensis) was collected during the research expedition FK181031 on R/V Falkor to the southern Pescadero Basin on 14 November 2018. The sample was collected during dive S193 at the Auka hydrothermal vent site (23.954822° N, 108.863009° W, water depth of 3,657 m), near the site where rocks nos. NA091-R045 and NA091-R008 were collected in 2017. The sediment push core was extruded upwards and sectioned into discrete 3 cm depth horizons on board immediately after recovery, transferred into sterile Whirl-Pak bags and sealed in a larger Mylar bag, flushed with argon gas, heat-sealed and stored at 4 °C until use in the laboratory.Sample collection permits for the expedition were granted by the Dirección General de Ordenamiento Pesquero y Acuícola, Comisión Nacional de Acuacultura y Pesca (Permiso de Pesca de Fomento no. PPFE/DGOPA-200/18) and the Dirección General de Geografía y Medio Ambiente, Instituto Nacional de Estadística y Geografía (authorization no. EG0122018), with the associated diplomatic note no. 18-2083 (CTC/07345/18) from the Secretaría de Relaciones Exteriores-Agencia Mexicana de Cooperación Internacional para el Desarrollo/Dirección General de Cooperación Técnica y Científica.Artificial seawater medium recipeArtificial seawater was prepared as described in Scheller et al.47 with minor modifications. Briefly, 1 l of artificial seawater (ASW) medium contained 46.6 mM MgCl2, 9.2 mM CaCl2, 485 mM NaCl, 7 mM KCl, 20 mM Na2SO4, 1 mM K2HPO4, 2 mM NH4Cl, 1 ml of 1,000× trace element solution, 1 ml of 1,000× vitamin solution and 0.5 mg of resazurin and was buffered by 25 mM HEPES buffer adjusted to pH 7.5. One litre of 1,000× trace element solution contained 50 mM nitrilotriacetic acid, 5 mM FeCl3, 2.5 mM MnCl2, 1.3 mM CoCl2, 1.5 mM ZnCl2, 0.32 mM H3BO3, 0.38 mM NiCl2, 0.03 mM Na2SeO3, 0.01 mM CuCl2, 0.21 mM Na2MoO4 and 0.02 mM Na2WO4. One litre of 1,000× vitamin solution contained 82 μM d-biotin, 45 μM folic acid, 490 μM pyridoxine, 150 μM thiamine, 410 μM nicotinic acid, 210 μM pantothenic acid, 310 μM para-aminobenzoic acid, 240 μM lipoic acid, 14 μM choline chloride and 7.4 μM vitamin B12.Enrichment cultivationRock no. NA091-R045 was anaerobically fragmented; then, approximately 5 g wet weight was crushed using a sterile agate mortar and pestle on 8 November 2018 and immediately immersed in anaerobic ASW medium in 25–125 ml of butyl rubber-stoppered serum bottles supplemented with different carbon/energy sources, including lactate, H2/CO2, hexane and decane and incubated in the dark at 40 °C (Extended Data Fig. 1a). The headspace for all cultures was flushed and overpressurized with N2 gas (2 atm). For the H2-containing cultures, the N2 gas headspace was replaced with H2/CO2 at an 80:20 mixture by flushing for 1 min and subsequent equilibration at 2 atm. After 33 d of incubation, the lactate-fed first-generation culture produced 5 mM sulphide, indicating active sulphate reduction. This enrichment was mixed by gentle shaking and diluted 1:100 vol/vol into fresh anaerobic ASW medium containing the same suite of carbon/energy sources as described above (Extended Data Fig. 1b). A transfer using the liquid fraction-lacking rock particles from the primary lactate enrichment was also included to enrich for members of the planktonic community alone with lactate as the carbon and energy source. This enrichment was later found to be devoid of the AAG (Heimdall) phylotype. Third- and fourth-generation cultures were set up in the following months through 1:100 dilution (Extended Data Fig. 1b). Further details of microbial community development in these enrichments are provided in Supplementary Note 1 and Supplementary Tables 1–3.R008 was prepared as above except using 2 atm of methane in the headspace as the sole carbon source and electron donor. The culture was passaged twice using a 1:100 dilution under the same culturing conditions; the cell fraction was collected by centrifugation after a total of 22 months for metagenomic sequencing (described below).For sediment enrichment cultivation, the top 3 cm section of the sediment core was mixed with anaerobic ASW at a 1:4 vol/vol ratio; a total of 60 ml volume each was dispensed into seven 125 ml glass serum bottles sealed with butyl rubber stoppers. The headspace was replaced by ethane (2 atm) in 2 bottles (Supplementary Table 5), while the headspace in 1 bottle was replaced by 100% N2 gas (2 atm). The cultures were incubated at 37 °C in the dark. Further details on microbial community development are provided in Supplementary Note 1 and Supplementary Table 4.Mineralogical analysesThe mineralogical composition of rocks NA091-R045 and R008 was characterized on a PANalytical X’Pert Pro X-Ray diffractometer. A dried rock aliquot was finely powdered using a clean agate mortar and pestle and scanned from 3 to 75° (2θ angle) at a 0.0167° step size. Mineral identification was performed with the X’Pert HighScore software v4.1 using the search and march algorithm.DNA extractionCombined cells with rock or sediment substrate were pelleted through centrifugation at 13,000 r.p.m. for 3 min. For amplicon sequencing, unless specified in Supplementary Table 6, DNA was extracted using the Qiagen DNeasy PowerSoil kit (catalogue no. 47014) according to the manufacturer’s instructions as described previously48 with a minor modification, where mechanical shearing was carried out using the MP Biomedicals FastPrep-24 system (catalogue no. 116004500) at level 5.5 for 45 s. For genomic sequencing, incubated rock and sediment cultures were extracted using multiple approaches, including the Qiagen DNeasy PowerSoil kit, ZymoBIOMICS 96 MagBead DNA Kit (catalogue no. D4302; Zymo Research Corporation), Quick-DNA 96 Kit (catalogue no. D3010; Zymo Research Corporation), ZymoBIOMICS DNA Microprep Kit (catalogue no. D4301; Zymo Research Corporation) and a standard phenol/chloroform-based protocol. The list of samples and their extraction methods are provided in Supplementary Table 6.16S rRNA gene amplicon sequencingFor amplicon (iTAG) sequencing of 16S rRNA genes, extracted DNA was amplified using primer pair 515f/806r GTGCCAGCMGCCGCGGTAA/ GGACTACHVGGGTWTCTAAT, barcoded and sequenced at Laragen using the Illumina MiSeq platform and analysed using Qiime v.1.8.0 (ref. 49) as described previously48. Taxonomic assignment was based on the SILVA 138 database (https://www.arb-silva.de)50.Full-length 16S archaeal rRNA gene sequences were amplified using the archaeal primer pair SSU1Arf/SSU1492Rngs TCCGGTTGATCCYGCBRG/ CGGNTACCTTGTKACGAC as described by Bahram et al.51, multiplexed as instructed by PacBio and sequenced using the PacBio Sequel II at the Brigham Young University DNA Sequencing Center and then analysed using the DADA2 package v1.9.1 in R v3.6.0 as described in Callahan et al.52 using the SILVA 138 database for taxonomic classification. Note that in the SILVA 138 database, all Asgard archaea clades are classified under Asgardarchaeota.Metagenomic sequencingA total of 11 metagenomic sequencing runs were performed using the Illumina and Oxford Nanopore platforms, with details listed in Supplementary Table 6. For Illumina short-read sequencing, libraries were constructed using the NEBNext Ultra and Nextera Flex Library kits as specified in the Supplementary Table 6. Sequencing was carried out using a HiSeq 2500 system (single-end, 100 bp) at the Caltech Genetics and Genomics Laboratory and HiSeq 4000 system at Novogene (paired-end, 150 bp). Only paired-end data were used for assembly, while all data were used for error correction. Due to the low DNA quantity obtained from the sediment incubation that yielded Ca. H. aukensis, we used multiple displacement amplification with the QIAGEN REPLI g Midi Kit before library preparation for Nanopore sequencing. Oxford Nanopore sequencing libraries were constructed using the PCR Barcoding Kit (catalogue no. SQK-PBK004) and were sequenced on MinION flow cells FLO-MIN106. Base calling was performed with the ONT Guppy software v.3.4.5.Genome assembly, error correction and read coverage mappingTwo different approaches were used to assemble contiguous genomes from metagenomes. For species of interest, if Nanopore sequencing yielded high read coverage and read lengths N50 > 2 kb, we obtained more contiguous genomes through de novo assembly purely based on Nanopore reads. If Nanopore sequencing did not yield a high number of reads or exhibited low read lengths, we obtained more contiguous genomes through de novo assembly first based on Illumina reads and then joined using Nanopore reads.For Ca. H. endolithica, Nanopore sequencing data were assembled de novo using Canu17 v.2.1, which yielded a 30 Mbp assembly, including a 3.4 Mbp contig. The approximate 40 kilobase (kb) regions at two ends of an approximate 3.4 Mbp contig were repetitive. This repeated region was deleted at one end and the two ends were joined to result in a circular genome. The resulting genome was mapped using BamM (http://ecogenomics.github.io/BamM/, based on Burrows–Wheeler Aligner53 mapping) with 150 bp Illumina paired-end reads (88× coverage on average) and 100 bp single-end reads (20× coverage). Mapped reads were then used for error correction through pilon54 v.1.22. To account for the reduced mapping at the edges (approximate 50 bp region), the two ends of the genomic sequence were joined, read-mapped and error-corrected again using the same methods. After the genome was annotated, it was rotated such that the genomic sequence ended with tRNA (GlyCCC), which was the integration site of the putative provirus HeimV1. All sequencing reads derived from incubations of the same rock were mapped onto the final genome using BamM, which was then used for coverage calculation through bedtools (https://bedtools.readthedocs.io/en/latest/).For Ca. H. aukensis, Illumina PE150 bp sequencing data were assembled using SPAdes18 v.3.14.1 with the ‘-meta’ option and k-mers 21,33,55,77,99. The assembly was then scaffolded using Nanopore reads through two iterations of LRScaf55 v.1.1.10. The Ca. H. aukensis genome was joined after trimming the identical sequences at the two ends. The end-joining region was verified through PCR amplification and Sanger sequencing using the primer pair CGCTTTCTTCAAACAATATTTCTGGTG/CTTACTTTCTCTCGGTCCATTTTTCAC. Finally, a 1 kbp stretch of unresolved genomic sequence at an approximate 2.9 Mbp position was resequenced through PCR amplification and Sanger sequencing using the primers GAGTTTTTTCAATCTTATAATGCCAAACTAAAAAATAG (forward), CAGTCAGATTTGACACAATTTTGGTC (reverse) and GCTGGACTCAACCTATAACTAATAGT (reverse). The final assembly was read-mapped, error-corrected through pilon v.1.24 using 346× coverage. It was rotated as described above to place the tRNA gene GlyCCC at the end.The metagenome containing the Lokiarchaeote Ca. H. repetitus FW102 was assembled using Canu v.2.1, as described for the Ca. H. endolithica genome, and then binned using metabat2 v.2.15 (ref. 56) with default parameters. The bin was then used to recruit long reads using minimap2 v.2.17 and reassembled and binned again. We then used LRScaf to scaffold the contigs and used ten iterations of pilon v.1.24 to achieve error correction and resolve ambiguous bases.The Thorarcheote FW25 MAG was assembled using the hybrid assembly of Illumina reads and Nanopore reads using SPAdes v.3.14.1 with k-mers 21,33,55,77,99, and then binned using metabat2 v.2.15 with default parameters. The MAG bin was then used to recruit reads through MIRAbait in the MIRA v.4 package (http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#chap_intro). These reads were then used for hybrid assembly with Nanopore long reads via SPAdes v.3.14.1 with k-mers 21,33,55,77,99. It was then binned again using metabat2 v.2.15 with default parameters to yield the final Thorarcheote FW25 MAG.The metagenome containing Gerdarchaeote AC18 was assembled from Illumina reads using SPAdes v.3.14.1 with k-mers 21,33,55,77,99 and then binned using metabat2 v.2.15 with default parameters. The MAG bin was then used to recruit reads through MIRAbait in the MIRA v.4 package and then reassembled and binned using SPAdes and metabat2 to yield the final Gerdarchaeote AC18 bin.Alignment fraction, ANI and AAIANI and alignment fraction values, independently calculated for rRNA, tRNA and coding gene sequences were obtained using ANIcalculator57 2014-127, v.1.0 (https://ani.jgi.doe.gov/html/download.php?). Note that Lokiarchaeote FW102 contains 2 copies of 16S rRNA genes at 99% identity with each other, and Thorarchaeote BC has a partial 16S rRNA gene. The alignment of 16S rRNA was carried out using SINA58 v.1.2.11. The AAI values of translated proteomes were obtained with the enveomics package v1.8.059. The final output is shown in Supplementary Table 7.Genome and mobilome annotationsGene calling was done using a combination of Prodigal v.2.6.3 and Glimmer v.3.0.2 using translation code 11 within the RASTtk60 pipeline, now under the PATRIC package v1.03261. Translated coding sequences were annotated and domain-assigned using eggNOG mapper39 v.2. The tRNA, 16S rRNA and 23S rRNA genes were identified using RNAmmer62 v.1.2 embedded in RASTtk. Thus far, 5S rRNA gene sequences could not be predicted through the existing HMM using various approaches. Long, non-tandem repeats were identified using RASTtk with the default cut-off of 95% identity and 100 bp. Tandem repeat sequences were identified using RASTtk, Prokka v1.14.6 and CRISPRCasTyper 1.1.463. Prokka and CRISPRCasTyper both employ MinCED (https://github.com/ctSkennerton/minced) to identify repeats and detect intragenic tandem repeats, which were manually removed from the CRISPR–Cas analyses. The Cas genes were annotated using CRISRCasTyper.All identified Heimdallarchaeum mobilomes were further analysed using PSI-BLAST 1.10.064, CDD search v3.1965 and PhANNs webserver (version March 2021)37.Genome evaluation and HMM constructionMarker coverage was carried out using a two-step process. First, we used the automated marker analyses via CheckM66 v.1.1.3 with the lineage_wf option and the default HMM E value cut-off, which included the 149 standard archaeal single-copy marker set. Next, each of the missing markers was examined with hmmer67 v.3.3.2 using the hmmsearch option with manual inspection of alignment regions and bitscores. This rescued markers unidentified through the default cut-offs by CheckM as well as divergent variants that most likely functionally replace the genuinely missing marker. The detailed description of markers missed by CheckM can be found in Supplementary Note 2 and the final evaluation of marker presence is displayed in Extended Data Fig. 4a and Supplementary Table 15. Next, we constructed an updated HMM set to replace the CheckM set by (1) updating all HMM to the most recent versions, (2) removing the six commonly missing or duplicated markers shown in Extended Data Fig. 4a from the list and (3) overcoming the pitfall of existing HMMs constructed using only a few sequences acquired from Euryarchaeota and Crenarchaeota. We manually constructed Asgard-specific versions based on the 282 Asgard archaea genomes. The HMMs constructed in this study are PF00832.ASG, PF00861.ASG, PF01194.ASG, PF01287.ASG, PF01667.ASG, PF03874.ASG, PF03876.ASG, PF13656.ASG, TIGR00270.ASG, TIGR00336.ASG, TIGR00442.ASG, TIGR02338.ASG and TIGR03677.ASG. The updated HMM file has been provided as a supplementary data file. The updated HMM was used to evaluate the 282 genomes reported in this study and in the literature3,6,7,8,9,10,11,12,16,23,26,68,69,70,71,72,73,74,75,76,77 through (1) CheckM, which uses Prodigal for gene calling, and (2) the more up to date HMMER3.2.2 on our gene calls described above. The latter generally produced slightly higher completeness and redundancy values (Supplementary Tables 8 and 9). For the expanded set of Asgard archaea genomes used for the phylogenomic analyses shown in Extended Data Fig. 4b, we applied the following filtering criteria: ≤100 contigs, >96% marker completeness and 20% sequence identity, >85% sequence alignment and 30% sequence identity, >90% sequence alignment and More