Abstract
Marine algae underpin entire ocean ecosystems. Yet algae in culture poorly represent their large environmental diversity, and we have a limited understanding of their convoluted evolution by endosymbiosis. Here, we perform a phylogeny-guided plastid genome-resolved metagenomic survey of Tara Oceans expeditions. We present a curated resource of 660 new non-redundant plastid genomes of environmental marine algae, vastly expanding plastid genome diversity within major algal groups, including many without closely related reference genomes. Notably, we recover four plastid genomes, including one near-complete, forming a deep-branching plastid lineage of nano-size algae that we informally name leptophytes. This group is globally distributed and generally rare, although it can reach relatively high abundance in the Arctic. A near-complete mitochondrial genome showing strong co-occurrence with leptophyte plastids is also recovered and assigned to this group. Leptophytes encompass the enigmatic plastid group DPL2, one of the very few known plastid groups not clearly belonging to major algal groups and previously known only from 16S rDNA sequences. Comparative organellar genomics and phylogenomics indicate that leptophytes are sister to haptophytes, and raise the intriguing possibility that cryptophytes acquired their plastids from haptophytes. Collectively, our study demonstrates that metagenomics can reveal hidden organellar diversity, and improve models of plastid evolution.
Data availability
The 937 metagenomes from Tara Oceans used in the study are publicly available at the EBI under project PRJEB402. Data our study generated has been deposited in an online repository: https://doi.org/10.17044/scilifelab.2821217381. This link provides access to the individual FASTA files from each plastid and mitochondrial genome used in our study (including the 660 non-redundant ptMAGs and 34 mtMAGs), the co-assembly of the top six samples where Lepto-01 was most abundant, individual gene alignments, concatenated and trimmed alignments, and maximum-likelihood and Bayesian tree files for the phylogenomic dataset. Source Data for Fig. 4, Supplementary Figs. 10-12, and Supplementary Fig. 22 can be found on the linked GitHub repository, while source data for Supplementary Figs. 2-6 is provided as Supplementary Data 1.
Code availability
All scripts used for genome annotation and phylogenetic analyses are available on GitHub: https://github.com/burki-lab/ptMAGs with the identifier: 10.5281/zenodo.1763560482.
References
Andersen, R. A. Diversity of eukaryotic algae. Biodivers. Conserv. 1, 267–292 (1992).
Archibald, J. M. The puzzle of plastid evolution. Curr. Biol. 19, R81–R88 (2009).
Irisarri, I., Strassert, J. F. H. & Burki, F. Phylogenomic insights into the origin of primary plastids. Syst. Biol. 71, 105–120 (2022).
Keeling, P. J. The endosymbiotic origin, diversification and fate of plastids. Philos. Trans. R. Soc. B Biol. Sci. 365, 729–748 (2010).
Penot, M., Dacks, J. B., Read, B. & Dorrell, R. G. Genomic and meta-genomic insights into the functions, diversity and global distribution of haptophyte algae. Appl. Phycol. 3, 340–359 (2022).
Malviya, S. et al. Insights into global diatom distribution and diversity in the world’s ocean. Proc. Natl. Acad. Sci. USA 113, E1516–E1525 (2016).
Dorrell, R. G. et al. Chimeric origins of ochrophytes and haptophytes revealed through an ancient plastid proteome. eLife 6, e23717 (2017).
Stiller, J. W. et al. The evolution of photosynthesis in chromist algae through serial endosymbioses. Nat. Commun. 5, 5764 (2014).
Ševčíková, T. et al. Updating algal evolutionary relationships through plastid genome sequencing: did alveolate plastids emerge through endosymbiosis of an ochrophyte? Sci. Rep. 5, 10134 (2015).
Keeling, P. J. et al. The marine microbial eukaryote transcriptome sequencing project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLOS Biol 12, e1001889 (2014).
Moreira, D. & López-García, P. The rise and fall of Picobiliphytes: how assumed autotrophs turned out to be heterotrophs. BioEssays 36, 468–474 (2014).
Kawachi, M. et al. Rappemonads are haptophyte phytoplankton. Curr. Biol. 31, 2395–2403 (2021).
Choi, C. J. et al. Newly discovered deep-branching marine plastid lineages are numerically rare but globally distributed. Curr. Biol. 27, R15–R16 (2017).
Schön, M. E. et al. Single cell genomics reveals plastid-lacking Picozoa are close relatives of red algae. Nat. Commun. 12, 6651 (2021).
Sunagawa, S. et al. Tara Oceans: towards global ocean ecosystems biology. Nat. Rev. Microbiol. 18, 428–445 (2020).
Delmont, T. O. et al. Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean. Cell Genomics 2, 100123 (2022).
Ruscheweyh, H.-J. et al. Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments. Microbiome 10, 212 (2022).
Dmitrijeva, M. et al. The mOTUs online database provides web-accessible genomic context to taxonomic profiling of microbial communities. Nucleic Acids Res. 53, D797–D805 (2025).
Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).
Eren, A. M. et al. Anvi’o: an advanced analysis and visualization platform for’omics data. PeerJ 3, e1319 (2015).
Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat. Microbiol. 6, 3–6 (2021).
Kim, J. I. et al. Evolutionary dynamics of cryptophyte plastid genomes. Genome Biol. Evol. 9, 1859–1872 (2017).
de Vargas, C. et al. Eukaryotic plankton diversity in the sunlit ocean. Science 348, 1261605 (2015).
Muñoz-Gómez, S. A. & Slamovits, C. H. Plastid Genomes in the Myzozoa. in Advances in Botanical Research (eds Chaw, S.-M. & Jansen, R. K.) vol. 85 55–94 (Academic Press, 2018).
Matsuo, E. et al. Comparative plastid genomics of green-colored dinoflagellates unveils parallel genome compaction and RNA editing. Front. Plant Sci. 13, 918543 (2022).
Kamikawa, R. et al. Plastid genome-based phylogeny pinpointed the origin of the green-colored plastid in the dinoflagellate Lepidodinium chlorophorum. Genome Biol. Evol. 7, 1133–1140 (2015).
de Vries, J. & Archibald, J. M. Plastid genomes. Curr. Biol. 28, R336–R337 (2018).
Ha, J.-S. et al. Plastid genome evolution of two colony-forming benthic Ochrosphaera neapolitana Strains (Coccolithales, Haptophyta). Int. J. Mol. Sci. 24, 10485 (2023).
Baños, H., Susko, E. & Roger, A. J. Is Over-parameterization a problem for profile mixture models? Syst. Biol. 73, 53–75 (2024).
Szánthó, L. L., Lartillot, N., Szöllősi, G. J. & Schrempf, D. Compositionally constrained sites drive long-branch attraction. Syst. Biol. 72, 767–780 (2023).
Williamson, K. et al. A robustly rooted tree of eukaryotes reveals their excavate ancestry. Nature 640, 974–981 (2025).
Muñoz-Gómez, S. A. et al. Site-and-branch-heterogeneous analyses of an expanded dataset favour mitochondria as sister to known Alphaproteobacteria. Nat. Ecol. Evol. 6, 253–262 (2022).
Crotty, S. M. et al. GHOST: recovering historical signal from heterotachously evolved sequence alignments. Syst. Biol. 69, 249–264 (2020).
Susko, E. & Roger, A. J. On reduced amino acid alphabets for phylogenetic inference. Mol. Biol. Evol. 24, 2139–2150 (2007).
Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010).
Pierella Karlusich, J. J. et al. A robust approach to estimate relative phytoplankton cell abundances from metagenomes. Mol. Ecol. Resour. 23, 16–40 (2023).
Hirakawa, Y. & Ishida, K.-I. Polyploidy of endosymbiotically derived genomes in complex algae. Genome Biol. Evol. 6, 974–980 (2014).
Shrestha, B. et al. Global metagenomics reveals plastid diversity and unexplored algal lineages. bioRxiv 2025.03644651 https://doi.org/10.1101/2025.03.28.644651 (2025).
Strassert, J. F. H., Irisarri, I., Williams, T. A. & Burki, F. A molecular timescale for eukaryote evolution with implications for the origin of red algal-derived plastids. Nat. Commun. 12, 1879 (2021).
McFadden, G. I. The cryptomonad nucleomorph. Protoplasma 254, 1903–1907 (2017).
Tanifuji, G. et al. Complete Nucleomorph Genome Sequence of the Nonphotosynthetic Alga Cryptomonas paramecium Reveals a Core Nucleomorph Gene Set. Genome Biol. Evol. 3, 44–54 (2011).
Rice, D. W. & Palmer, J. D. An exceptional horizontal gene transfer in plastids: gene replacement by a distant bacterial paralog and evidence that haptophyte and cryptophyte plastids are sisters. BMC Biol 4, 31 (2006).
Khan, H. et al. Plastid genome sequence of the cryptophyte alga Rhodomonas salina CCMP1319: lateral transfer of putative DNA replication machinery and a test of chromist plastid phylogeny. Mol. Biol. Evol. 24, 1832–1842 (2007).
Burki, F., Roger, A. J., Brown, M. W. & Simpson, A. G. B. The new tree of eukaryotes. Trends Ecol. Evol. 35, 43–55 (2020).
Ponce-Toledo, R. I., Moreira, D., López-García, P. & Deschamps, P. Molecular phylogeny of the SELMA translocation machinery recounts the evolution of complex photosynthetic eukaryotes. Mol. Biol. Evol. msaf167 https://doi.org/10.1093/molbev/msaf167 (2025).
Pietluch, F., Mackiewicz, P., Ludwig, K. & Gagat, P. A new model and dating for the evolution of complex plastids of red alga origin. Genome Biol. Evol. 16, evae192 (2024).
Gaïa, M. et al. Mirusviruses link herpesviruses to giant viruses. Nature 616, 783–789 (2023).
Gaïa, M. et al. Closest relatives of poxviruses are spread in the gut of humans and animals worldwide: the egoviruses. 2024.03.23.586382 Preprint at https://doi.org/10.1101/2024.03.23.586382 (2025).
Martinez-Gutierrez, C. A. & Aylward, F. O. Phylogenetic signal, congruence, and uncertainty across bacteria and archaea. Mol. Biol. Evol. 38, 5514–5527 (2021).
Shaw, J. & Yu, Y. W. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nat. Methods 20, 1661–1665 (2023).
Puerta, M. V. S., Bachvaroff, T. R. & Delwiche, C. F. The complete plastid genome sequence of the haptophyte emiliania huxleyi: a comparison to other plastid genomes. DNA Res 12, 151–156 (2005).
Butenko, A., Lukeš, J., Speijer, D. & Wideman, J. G. Mitochondrial genomes revisited: why do different lineages retain different genes? BMC Biol. 22, 15 (2024).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Lang, B. F. et al. Mitochondrial genome annotation with MFannot: a critical analysis of gene identification and gene model prediction. Front. Plant Sci. 14, 1222186 (2023).
Greiner, S., Lehwark, P. & Bock, R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 47, W59–W64 (2019).
Janouškovec, J., Horák, A., Oborník, M., Lukeš, J. & Keeling, P. J. A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc. Natl. Acad. Sci. USA 107, 10949–10954 (2010).
Ponce-Toledo, R. I. et al. An Early-Branching Freshwater Cyanobacterium at the Origin of Plastids. Curr. Biol. 27, 386–391 (2017).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Kozlov, A., Darriba, D., Flouri, T., Morel, B. & Stamatakis, A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 35, 4453–4455 (2019).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Whelan, S., Irisarri, I. & Burki, F. PREQUAL: detecting non-homologous characters in sets of unaligned homologous sequences. Bioinformatics 34, 3929–3930 (2018).
Susko, E., Lincker, L. & Roger, A. J. Accelerated estimation of frequency classes in site-heterogeneous profile mixture models. Mol. Biol. Evol. 35, 1266–1283 (2018).
Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611–615 (2013).
Liu, C. et al. The influence of the number of tree searches on maximum likelihood inference in phylogenomics. Syst. Biol. 73, 807–822 (2024).
Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002).
Yu, G. Using GGTREE to visualize data on tree-like structures. Curr Protoc Bioinform 69, e96 (2020).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238 (2019).
Boyer, T. P. et al. World ocean database 2013. National Oceanographic Data Center (U.S.), Ocean Climate Laboratory https://doi.org/10.7289/V5NZ85MT (2013).
Aumont, O., Ethé, C., Tagliabue, A., Bopp, L. & Gehlen, M. PISCES-v2: an ocean biogeochemical model for carbon and ecosystem studies. Geosci. Model Dev. 8, 2465–2513 (2015).
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Guillou, L. et al. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 41, D597–D604 (2013).
Mahé, F., Henry, N., de Vargas, C., Tara Oceans Consortium, C. & Tara Oceans Expedition, P. rDNA 18S V9 metabarcoding tables (Swarm) for Tara Oceans Expedition (2009-2013), including Tara Polar Circle Expedition (2013). Zenodo (2022).
Zavadska, D., Henry, N., Auladell, A., Berney, C. & Richter, D. J. Diverse patterns of correspondence between protist metabarcodes and protist metagenome-assembled genomes. PLOS ONE 19, e0303697 (2024).
Wickham, H. Ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag, 2016).
Aphalo, P. ggpmisc: Miscellaneous Extensions to ‘ggplot2’. (Springer, 2025).
Shulgina, Y. & Eddy, S. R. Codetta: predicting the genetic code from nucleotide sequence. Bioinformatics 39, btac802 (2022).
Jamy, M. Data for ‘Identification of a deep-branching lineage of algae using environmental plastid genomes’. figshare https://doi.org/10.17044/scilifelab.28212173 (2025).
Jamy, M. Code for ‘Identification of a deep-branching lineage of algae using environmental plastid genomes’. Zenodo https://doi.org/10.5281/zenodo.17635604 (2025).
Keeling, P. J. & Eglit, Y. Openly available illustrations as tools to describe eukaryotic microbial diversity. PLoS Biol. 21, e3002395 (2023).
Acknowledgements
We thank Shinichi Sunagawa for having facilitated the recovery of relevant data from the mOTU metagenomic database maintained by his research group at the Department of Biology at ETH Zürich. We thank A. Roger, H. Baños, and C. McCarthey for discussions, and for kindly providing custom scripts for running the phylogenetic models MEOW and GF-MIX. We thank J.E. Dharamshi for discussions about phylogenetic analyses. Our survey was made possible by the sampling and sequencing efforts of the Tara Oceans Project. Tara Oceans (which includes the Tara Oceans and Tara Oceans Polar Circle expeditions) would not exist without the leadership of the Tara Oceans Foundation and the continuous support of 23 institutes (https://oceans.taraexpeditions.org/). This article is contribution number 164 of Tara Oceans. Phylogenetic analyses were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS 2024/5-197), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. M.J. was supported by the Swedish Research Council (International Postdoc grant 2022-00351). FB’s research is supported by grants from the European Research Council (ERC consolidator grant 101044505), the Swedish Research Council VR (2021-04055), and Science for Life Laboratory. TD’s research is supported by a grant from the l’Agence Nationale de la Recherche (ANR-23-CE02-0022). We also thank the commitment of the CNRS and Genoscope/CEA. Some of the computations were performed using the platine, titane and curie HPC machine provided through GENCI grants (t2011076389, t2012076389, t2013036389, t2014036389, t2015036389 and t2016036389).
Funding
Open access funding provided by Uppsala University.
Author information
Authors and Affiliations
Contributions
F.B. and T.O.D. conceived the project. T.O.D. characterised the ptMAGs. M.J., F.B. and T.H., performed phylogenetic analyses. T.A., E.P. and T.O.D. created the plastid genomic database and performed surveys for nucleomorphs. H.J.R. and L.P. retrieved relevant data from mOTUs. M.J. and T.H. annotated the ptMAGs, and E.P. performed mapping analyses. F.B., M.J., and T.O.D. wrote the manuscript with input from all the authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks John Archibald and the other anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Descriptions of Additional Supplementary Files
Supplementary Dataset 1
Supplementary Dataset 2
Supplementary Dataset 3
Supplementary Dataset 4
Supplementary Dataset 5
Supplementary Dataset 6
Reporting Summary
Transparent Peer Review file
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and permissions
About this article
Cite this article
Jamy, M., Huber, T., Antoine, T. et al. Identification of a deep-branching lineage of algae using environmental plastid genomes.
Nat Commun (2025). https://doi.org/10.1038/s41467-025-67401-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-67401-4
Source: Ecology - nature.com
