Sampling and incubation
Four rock samples were collected from the 3.7 km-deep Auka vent field in the Southern Pescadero Basin (23.956094N, 108.86192W)20,23. Sample NA091.008 was collected in 2017 on cruise NA091 with the Eexploration vessle Nautilus and incubated as described previously34. Samples 12,019 (S0200-R1), 11,719 (S0193-R2) and 11,868 (S0197-PC1), the latter representing a lithified nodule recovered from a sediment push core, were collected with Remotely operated vehicle SuBastian and Research vessel Falkor on cruise FK181031 in November 2018. These samples were processed shipboard and stored under anoxic conditions at 4 °C for subsequent incubation in the laboratory. In the laboratory, rock samples 12,019 and 11,719 were broken into smaller pieces under sterile conditions, immersed in N2-sparged sterilized artificial sea water and incubated under anoxic conditions with methane, as described previously for NA091.008 (ref. 34). Additional sampling information can be found in Supplementary Table 1. Mineralogical analysis by X-ray Powder Diffraction (XRD) identified barite in several of these samples, collected from two locations in the Auka vent field, including on the western side of the Matterhorn vent (11,719, NA091.008), and one oil-saturated sample (12,019) recovered from the sedimented flanks from the southern side of Z vent. Our analysis also includes metagenomic data from two sediment cores from the Auka vent field (DR750-PC67 and DR750-PC80) collected in April 2015 with the ROV Doc Ricketts and R/V Western Flyer (MBARI2015), previously published (ref. 23).
Fluorescence in situ hybridization
Samples were fixed shipboard using freshly prepared paraformaldehyde (2 vol% in 3× Phosphate Buffer Solution (PBS), EMS15713) at 4 °C overnight, rinsed twice using 3× PBS, and stored in ethanol (50% in 1× PBS) at −20 °C until processing. Small pieces (<1 cm3) of the mineral sample NA091.008 were gently crushed in a sterile agate mortar and pestle in a freshly prepared, filter sterilized 80% ethanol – 1× PBS solution. About 500 μl of the resulting mixture was sonicated three times in 15 second bursts on a Branson Sonifier W-150 ultrasonic cell disruptor (level 3) on ice with a sterile remote-tapered microtip probe inserted into the liquid. Cells were separated from the mineral matrix using an adapted protocol of density separation using Percoll (Sigma P4937)7. The density-separated cells were filtered on 25 mm polycarbonate filters with a pore size of 0.22 μm (Millipore GTTP2500), and rinsed using 1× PBS. Fluorescence in situ hybridizations were carried out as described previously7 using a 1:1 mixture of an ANME-1 targeted probe (ANME-1-3509 labelled with Cy3) and the general bacterial probe mix EUB338 I-III (https://probebase.csb.univie.ac.at/), labelled with Alexa-488 in a 35% formamide solution (VWR EM-FX0420-8). Hybridized samples were imaged using a ×100 objective using a Zeiss Elyra structured illumination microscope with the Zen Black software.
DNA extraction and sequencing
DNA extraction from the mineral samples followed previously published protocols34. Metagenomic analysis from the extracted genomic DNA was outsourced to Quick Biology (Pasadena, CA) for library preparation and sequencing. Libraries were prepared with the KAPA Hyper plus kit using 10 ng of DNA as input. This input was subjected to enzymatic fragmentation at 37 °C for 10 min. After end repair and A-tailing, the DNA was ligated with an IDT adaptor (Integrated DNA Technologies Inc.). Ligated DNA was amplified with KAPA HiFi HotStart ReadyMix (2×) for 11 cycles. Post-amplification cleanup was performed with 1× KAPA pure beads. The final library quality and quantity were analysed and measured by Agilent Bioanalyzer 2100 (Agilent Technologies) and Life Technologies Qubit 3.0 Fluorometer (Life Technologies), respectively. Finally, the libraries were sequenced using 150 bp paired-end reads on Illumina HiSeq4000 Sequencer (Illumina Inc.). After sequencing, primers and adaptors were removed from all libraries using bbduk (https://sourceforge.net/projects/bbmap/) with mink = 6 and hdist = 1 as trimming parameters, and establishing a minimum quality value of 20 and a minimal length of 50 bp. For nanopore sequencing of incubated samples, DNA was amplified using multiple displacement amplification with the QIAGEN REPLI-g Midi kit before library preparation. Oxford Nanopore sequencing libraries were constructed using the PCR-free barcoding kit and were sequenced on PromethION platform by Novogene Inc.
Metagenomic analysis
The sequencing reads from unincubated rocks were assembled individually and in a coassembly using SPAdes v.3.12.0 (ref. 53). From the de-novo assemblies, we performed manual binning using Anvio v.6 (ref. 54). We assessed the quality and taxonomy affiliation from the obtained bins using GTDB-tk v.1.5.0 (ref. 55) and checkM v.1.13 (ref. 56). Genomes affiliated to ANME-1 and Syntrophoarchaeales were further refined via a targeted-reassembly pipeline. In this pipeline, the original reads were mapped to the bin of interest using bbmap (https://sourceforge.net/projects/bbmap/), then the mapped reads were assembled using SPAdes and the resulting assembly was filtered discarding contigs below 1,500 bp. This procedure was repeated during several rounds (between 11 and 50) for each bin, until we could not see an improvement in the bin quality. Bin quality was assessed using the checkM and considering the completeness, contamination (<5%), N50 value and number of scaffolds. The resulting bins were considered as MAGs. The sequencing reads for the incubated rocks 12,019 and 11,719 were assembled as described previously for NA091.R00834. Additionally, the assembly of 12,019 was then scaffolded using Nanopore reads through two iterations of LRScaf v.1.1.10 (ref. 57). The final assemblies were binned using metabat2 v.2.15 (ref. 58) using the default setting. Automatic metabolic prediction of the MAGs was performed using prokka v.1.14.6 (ref. 59) and curated with the identification of PFAM and TIGRFAM profiles using HMMER v.3.3 (hmmer.org), KEGG orthologs with Kofamscan60 and of COGs and arCOGs motifs61. To identify multiheme cytochromes in our genomes, we searched the motif CXXCH across the amino acid sequences predicted for each MAG. Similar metabolic predictions were carried out with publicly available ANME-1 and Syntrophoarchaeales genomes to compare the metabolic potential of the whole ANME-1 order. A list of the genomes used in this study can be found in Supplementary Table 2. For the comparison of different genomic features among the ANME-1 genomes, we searched for specific proteins using the assigned COGs, arCOGs and KEGG identifiers (Supplementary Table 5).
Genomic relative abundance analysis
We used the software coverM v.0.5 (https://github.com/wwood/CoverM) to calculate the genomic relative abundance of the different organisms of our samples, using all the MAGs we have extracted from our metagenomic analysis. We ran the software with the following additional parameters for dereplication (‘–dereplication-ani 95–dereplication-prethreshold-ani 90–dereplication-precluster-method finch’). Results were visualized in R v.4.2.1.
OGT analysis
We calculated the OGT for all ANME-1 and Syntrophoarchaeales MAGs included in our analysis (Supplementary Table 2) using the OGT_prediction tool described in Sauer and Wang24 with the regression models for Archaea excluding rRNA features and genome size.
Analysis of hydrogenase operons
Because only two of the five genomes of ‘Candidatus Methanospirare jalkutatii’ have an operon encoding a hydrogenase, we performed additional analysis to better understand this intraspecies distribution. On the one hand, we mapped the metagenomic reads from samples with genomes of ‘Candidatus Methanospirare jalkutatii’ (12019, FW4382_bin126, NA091.008, PR1007, PR1031B) to the MAGs containing the hydrogenase operon (FW4382_bin126, NA091.008_bin1) to check if reads mapping this operon are also present in samples from where the MAGs without the hydrogenase were recovered. For mapping the reads, we used bowtie2 v.2.4.2 (ref. 62) then transformed the sam files to bam using samtools (http://www.htslib.org/) and extracted the coverage depth for each position. Additionally, we performed a genomic comparison of the genomes with a hydrogenase operon (FW4382_bin126, NA091.008_bin1) with the genome FWG175 that was assembled into a single scaffold. For this, we used the genome-to-genome aligner Sibelia v.3.0.7 (ref. 63) and we visualized the results using Circos (http://circos.ca/).
Phylogenetic analysis
For the phylogenomic tree of the ANME-1 MAGs, we used the list of genomes present in Supplementary Table 2. As marker genes, we used 31 single copy genes (Supplementary Table 5) that we extracted and aligned from the corresponding genomes using anvi-get-sequences-for-hmm-hits from Anvio v.6 (ref. 54) with the parameters ‘–return-best-hit–max-num-genes-missing-from-bin 7–partition-file’. Seven genomes missed more than seven marker genes and were not used for the phylogenomic reconstruction present in Fig. 1 (ANME-1 UWMA-0191, Syntrophoarchaeum GoM_oil, ANME-1 ERB7, ANME-1 Co_bin174, ANME-1 Agg-C03, PB_MBMC_218, FW4382_bin035). The concatenated aligned marker gene set was then used to calculate a phylogenomic tree with RAxML v.8.2.12 (ref. 64) using a partition file to calculate differential models for each gene the following parameters ‘-m PROTGAMMAAUTO -f a -N autoMRE -k.’ The tree was then visualized using iTol65. For the clustering of the MAGs into different species, we dereplicated the ANME-1 MAGs using dRep v.2.6.2 with the parameter ‘-S_ani 0.95’ (ref. 66). A smaller phylogenomic tree was calculated with the genomes containing hydrogenase genes (Fig. 3). For this tree we also used Anvio v.6 and RAxML v.8.2.12 with the same parameters but excluding the flag ‘—max-num-genes-missing-from-bin’ from the anvi-get-sequences-for-hmm-hits command to include in the analysis those genomes with a lower number of marker genes that still contain hydrogenase genes (PB_MBMC_218, FW4382_bin035, ANME-1 UWMA-0191).
The 16S rRNA gene phylogenetic tree was calculated for the 16S rRNA genes predicted from our genome dataset that were full length. We included these full-length 16S rRNA genes in the SILVA_132_SSURef_NR99 database67 and with ARB v.6.1 (ref. 68) we calculated a 16S phylogenetic tree using the maximum-likelihood algorithm RAxML with GTRGAMMA as the model and a 50% similarity filter. In total, 1,000 bootstrap analyses were performed to calculate branch support values. The tree with the best likelihood score was selected.
For the construction of the hydrogenase phylogenetic tree (Supplementary Table 6), we used the predicted protein sequence for the large subunit of the NiFe hydrogenase present in the genomes of our dataset (Supplementary Table 2), a subset of the large subunit hydrogenases present in the HydDB database30 and the predicted hydrogenases present in an archaeal database using the COG motif for the large NiFe hydrogenase (COG0374) with the Anvio v.6 software. For the mcrD gene phylogeny, we used the predicted protein sequences of mcrD in the ANME-1c genomes and in the previously mentioned archaeal database with the TIGR motif TIGR03260.1 using also the Anvio v.6 software. The list of genomes from the archaeal database used in the analysis can be found in Supplementary Table 6. For both phylogenies, the protein sequences for the analysis were aligned using clustalw v.2.1 with default settings69. The aligned file was used to calculate a phylogenetic tree using RAxML v.8.2.12 (ref. 64) with the following parameters ‘-m PROTGAMMAAUTO -f a -N 100 –k’. The tree was then visualized using iTol65.
For the distribution and phylogenetic analysis of MCP and pPolB, known sequences encoded by various bacterial and archaeal viruses were used to build a Hidden Markov Model (HMM) via hmmer v.3.3.2. The HMM was then used to capture the corresponding components in proteomes of ANME-1 viruses and other MGEs. All sequences were then aligned using MAFFT v.7.475 (ref. 70) option linsi and trimmed using trimAl v1.4.1 (ref. 71) option gappyout for pPolB and 20% gap removal option for MCP. Maximum-likelihood analyses were carried out through IQtree v.2.1.12 (ref. 72) using model finder and ultrafast bootstrap with 2,000 replicates. The phylogenetic tree was visualized and prepared using iTol65.
For the distribution and phylogenetic analysis of ThyX, all ThyX sequences annotated by EggNOG mapper73 v.2 in the genomes of ANME-1 and their MGEs were used to create a HMM as described above, and used to search for close homologues in the GTDB202 database, IMGVR V.3 database and again in the proteomes and ANME-1 and their MGEs in this study. This yielded 261 sequences, which was then aligned and phylogenetically analysed as for pPolB.
CRISPR analysis
The CRISPR–Cas systems from the ANME-1 genomes and various metagenomic assemblies were annotated using CRISPRCasTyper v.1 (ref. 33). CRISPR spacer mapping on MGEs was carried out as previously described34 with the following modifications. To filter out unreliable sequences that may have arisen during MAG binning, we took a conservative measure of only retaining CRISPR repeats identified in at least three ANME-1 contigs. We additionally analysed the CRISPR repeats found in the Alkanophagales sister clade to ANME-1 using the same approach, which were found to have no overlap with the ANME-1 CRISPR repeats. To avoid accidental mapping to unrelated MGEs, we applied a second stringent criteria of only retaining MGEs with at least three ANME-1 protospacers. MGEs larger than 10 kb were retained for further analyses in this study.
MGE network analysis and evaluation
Open reading frames in all CRISPR-mapped MGE contigs were identified using the PATRIC package74. Gene similarity network analyses were done using vCONTACT v.2.0 (ref. 75) using the default reference (RefSeq202), with head-tailed viruses targeting haloarchaea and methanogens added as extra references42. Inverted and direct terminal repeats were detected using CheckV v.1.01. and the PATRIC package to determine genome completeness. Clustering confidence were obtained with default setting as described in ref. 75, where the P value was obtained via a one-sided Mann–Whitney U test and the topology confidence is obtained by multiplying the quality score of the subcluster and the P value.
MGE annotation and virus identification
MGE proteomes are annotated using sensitive HMM profile-profile comparisons with HHsearch v.3.3.2 (ref. 76) against the following publicly available databases: Pfam 33.1, Protein Data Bank (25 March 2021), CDD v.3.18, PHROG and uniprot_sprot_vir70 (9 February 2021)77. Putative MCP of Chaacviridae and Ixchelviridae could not be identified using sequence similarity-based approaches. Thus, the candidate proteins were subjected to structural modelling using AlphaFold2 (ref. 38) and RoseTTAFold v.1.1.0 (ref. 39). The obtained models were visualized using ChimeraX78 and compared with the reference structure of the MCP of corticovirus PM2 (PDB id: 2vvf). The contigs containing identifiable viral structural proteins are described as viruses. The remaining contigs are described as unclassified MGEs, including circular elements that are most likely plasmids of ANME-1 and possible viruses enveloped by yet unknown structural proteins.
Genome-scale virus comparisons
The viral genomes were annotated using Prokka v.1.14.6 (ref. 59) to produce genbank files. Select genbank files were then analysed using Clinker v.0.0.23 (ref. 79) to produce the protein sequence clustering and alignments. Proteome-scale phylogeny for the head-tailed viruses were carried out via the VipTree server43.
Etymology
Descriptions of proposed ANME-1c family and species
Family ‘Candidatus Methanospirareceae’. N.L. neut. n. methanum methane; N.L. pref. methano-, pertaining to methane; L.v. spirare, to breathe. Proposed classification: class Methanomicrobia, order ‘Candidatus Methanophagales’. The type species and strain is ‘Candidatus Methanospirare jalkutatii’ FWG175.
‘Candidatus Methanoxibalbensis ujae’. N.L. neut. n. methanum methane; N.L. pref. methano-, pertaining to methane; N.L. adj. xibalbensis, from the place called Xibalba, the Mayan word for the underworld; N.L. neut. n. Methanoxibalbensis methane-cycling organism present in deep-sea hydrothermal sediments; N.L. neut. adj. ujae, from the word ujá, meaning rock in Kiliwa, an indigenous language of the native peoples of Baja California, referring to the high abundance of this species in rock samples. Proposed classification: class Methanomicrobia, order ‘Candidatus Methanophagales’, family ‘Candidatus Methanoxibalbaceae’, genus ‘Candidatus Methanoxibalbensis’.
The material type is the genome designated NA091.008_bin2 (GCA_026134085.1), a MAG comprising 1.99 Mbp in 86 scaffolds. The MAG was recovered from mineral sample (NA091.008) from the hydrothermal environment of South Pescadero Basin.
‘Candidatus Methanospirare jalkutatii’. N.L. neut. n. methanum methane; N.L. pref. methano-, pertaining to methane; L.v. spirare, to breathe; N.L. neut. n. Methanospirare methane-breathing organism; N.L. masc. n. jalkutatii, a mythical dragon from stories of the indigenous Pa ipai people from Northern Baja, California. This dragon inhabited a beautiful place made of rocks and water similar to the Auka hydrothermal vent site. Proposed classification: class Methanomicrobia, order ‘Candidatus Methanophagales’, family ‘Candidatus Methanoxibalbaceae’, genus ‘Candidatus Methanospirare’.
The material type is the genome designated FWG175 (CP110382.1), a single-scaffolded MAG comprising 1.62 Mbp in one circular scaffold. This MAG was recovered from a methane-fed incubation of the mineral sample 12,019 retrieved from the hydrothermal environment of South Pescadero Basin.
Proposed classification of ANME-1 viruses
The order Coyopavirales is proposed within the existing class Tectiliviricetes, after Coyopa, the god of thunder in Mayan mythology. It contains tailless icosahedral viruses with previously unreported class of DJR MCPs and little proteome overlap with known viruses. The family Chaacviridae is proposed within Coyopavirales, after Chaac, the god of death in the Mayan mythology. It is characterized by a uniform 10–11 kb genome and a gene encoding protein-primed family B DNA polymerase (pPolB). We propose the genus names Homochaacvirus and Antichaacvirus (from homo, for same in Greek, and anti, for opposed in Greek, to emphasize the inversion of a gene module including the pPolB gene). Six complete genomes of chaacviruses have been obtained: Methanophagales virus PBV304 (OP548099) within sepcies Homochaacvirus pescaderoense, Methanophagales virus PBV305 (OP548100) within species Homochaacvirus californiaense, Methanophagales virus GBV261, Methanophagales virus GBV265, Methanophagales virus GBV275 and Methanophagales virus PBV266 (OP413841) within species Antichaacvirus pescaderoense. The candidate family Ixchelviridae is proposed within Coyopavirales, after Ix Chel, goddess of midwifery and medicine in the Mayan mythology. Ixchelviridae is represented by Pescadero Basin viruses PBV176 and PBV180, with assembly completeness unknown.
Candidate family Huracanviridae is proposed without higher-level ranking classification, after Hurancan, god of wind, storm and fire in Mayan mythology. It contains tailless icosahedral viruses with single jelly-roll MCPs. It is represented by Pescadero Basin viruses PBV264 and PBV238, with assembly completeness undetermined.
The order Nakonvirales is proposed within Caudoviricetes, after Nakon, the most powerful god of war in Mayan mythology. It contains head-tailed viruses with around 80 kb genomes and HK97-fold MCPs. The family Ahpuchviridae (after Ah Puch, the god of death in the Mayan mythology) includes one genus, Kisinvirus, (after Kisin, another Mayan god of death) and is represented by a single virus, Methanophagales virus PBV299 (OP413838) within species Kisinvirus pescaderoense. The family Ekchuahviridae (after Ek Chuah, the patron god of warriors and merchants in Mayan mythology), is represented by one genus, Kukulkanvirus (after Kukulkan, the war serpent in the Mayan mythology). It includes Methanophagales virus GBV301 (OP880252) within species Kukulkanvirus guaymasense and Methanophagales virus GBV302 (OP880253) within species Kukulkanvirus mexicoense, each encoding two divergent HK97-fold MCPs with their own capsid maturation proteases.
Seven other candidate families of head-tailed viruses are proposed without complete genome representatives. They form a phylogenetic cluster sister to Haloviruses (Fig. 5a), and according to the phylogenetic classifications of the latter, likely form multiple unclassified order-level clades. These candidate families are Acanviridae, Alomviridae, Bacabviridae, Baalhamviridae, Cabrakanviridae, Cacochviridae and Chiccanviridae, all named after gods in Mayan mythology.
The order Maximonvirales is proposed within Tokiviricetes, after Maximon, a god of travellers, merchants, medicine men/women, mischief and fertility in Mayan mythology. It contains rod-shaped viruses of a single family Ahmunviridae (after Ah Mun, the god of agriculture in Mayan mythology) with a single genus Yumkaaxvirus (after Yum Kaax, the god of the woods, the wild nature and the hunt in Mayan mythology). It is represented by the complete linear genome of Methanophagales virus PBV300 (OP413840) within species Yumkaaxvirus pescaderoense.
The family Itzamnaviridae is named after Itzamna, lord of the heavens and night and day in Mayan mythology. It contains spindle-shaped viruses that differ in genome sizes and are subdivided into two genera, which we propose naming Demiitzamnavirus and Pletoitzamnavirus (after demi– for half or partial, derived via French from Latin dimedius and pleto for full in Latin). They are respectively represented by complete genomes of Methanophagales virus GBV170 within species Demiitzamnavirus guaymasense, Methanophagales virus GBV303 (OP880254) within species Demiitzamnavirus mexicoense and Methanophagales virus PBV082 (OP413839) within species Pletoitzamnavirus pescaderoense.
Candidate families Tepeuviridae and Votanviridae, named after a skye god Tepeu and a legendary ancestral deity Votan, respectively, are proposed for two additional new clades of spindle-shaped viruses. Their genome representatives Tepeuvirus PBV144 and Votanvirus IMGVR0294848 are not yet circularized and are thus incomplete.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Source: Ecology - nature.com