Identification of a deep-branching lineage of algae using environmental plastid genomes
AbstractMarine algae underpin entire ocean ecosystems. Yet algae in culture poorly represent their large environmental diversity, and we have a limited understanding of their convoluted evolution by endosymbiosis. Here, we perform a phylogeny-guided plastid genome-resolved metagenomic survey of Tara Oceans expeditions. We present a curated resource of 660 new non-redundant plastid genomes of environmental marine algae, vastly expanding plastid genome diversity within major algal groups, including many without closely related reference genomes. Notably, we recover four plastid genomes, including one near-complete, forming a deep-branching plastid lineage of nano-size algae that we informally name leptophytes. This group is globally distributed and generally rare, although it can reach relatively high abundance in the Arctic. A near-complete mitochondrial genome showing strong co-occurrence with leptophyte plastids is also recovered and assigned to this group. Leptophytes encompass the enigmatic plastid group DPL2, one of the very few known plastid groups not clearly belonging to major algal groups and previously known only from 16S rDNA sequences. Comparative organellar genomics and phylogenomics indicate that leptophytes are sister to haptophytes, and raise the intriguing possibility that cryptophytes acquired their plastids from haptophytes. Collectively, our study demonstrates that metagenomics can reveal hidden organellar diversity, and improve models of plastid evolution.
Data availability
The 937 metagenomes from Tara Oceans used in the study are publicly available at the EBI under project PRJEB402. Data our study generated has been deposited in an online repository: https://doi.org/10.17044/scilifelab.2821217381. This link provides access to the individual FASTA files from each plastid and mitochondrial genome used in our study (including the 660 non-redundant ptMAGs and 34 mtMAGs), the co-assembly of the top six samples where Lepto-01 was most abundant, individual gene alignments, concatenated and trimmed alignments, and maximum-likelihood and Bayesian tree files for the phylogenomic dataset. Source Data for Fig. 4, Supplementary Figs. 10-12, and Supplementary Fig. 22 can be found on the linked GitHub repository, while source data for Supplementary Figs. 2-6 is provided as Supplementary Data 1.
Code availability
All scripts used for genome annotation and phylogenetic analyses are available on GitHub: https://github.com/burki-lab/ptMAGs with the identifier: 10.5281/zenodo.1763560482.
ReferencesAndersen, R. A. Diversity of eukaryotic algae. Biodivers. Conserv. 1, 267–292 (1992).
Google Scholar
Archibald, J. M. The puzzle of plastid evolution. Curr. Biol. 19, R81–R88 (2009).
Google Scholar
Irisarri, I., Strassert, J. F. H. & Burki, F. Phylogenomic insights into the origin of primary plastids. Syst. Biol. 71, 105–120 (2022).
Google Scholar
Keeling, P. J. The endosymbiotic origin, diversification and fate of plastids. Philos. Trans. R. Soc. B Biol. Sci. 365, 729–748 (2010).
Google Scholar
Penot, M., Dacks, J. B., Read, B. & Dorrell, R. G. Genomic and meta-genomic insights into the functions, diversity and global distribution of haptophyte algae. Appl. Phycol. 3, 340–359 (2022).
Google Scholar
Malviya, S. et al. Insights into global diatom distribution and diversity in the world’s ocean. Proc. Natl. Acad. Sci. USA 113, E1516–E1525 (2016).
Google Scholar
Dorrell, R. G. et al. Chimeric origins of ochrophytes and haptophytes revealed through an ancient plastid proteome. eLife 6, e23717 (2017).
Google Scholar
Stiller, J. W. et al. The evolution of photosynthesis in chromist algae through serial endosymbioses. Nat. Commun. 5, 5764 (2014).
Google Scholar
Ševčíková, T. et al. Updating algal evolutionary relationships through plastid genome sequencing: did alveolate plastids emerge through endosymbiosis of an ochrophyte? Sci. Rep. 5, 10134 (2015).
Google Scholar
Keeling, P. J. et al. The marine microbial eukaryote transcriptome sequencing project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLOS Biol 12, e1001889 (2014).
Google Scholar
Moreira, D. & López-García, P. The rise and fall of Picobiliphytes: how assumed autotrophs turned out to be heterotrophs. BioEssays 36, 468–474 (2014).
Google Scholar
Kawachi, M. et al. Rappemonads are haptophyte phytoplankton. Curr. Biol. 31, 2395–2403 (2021).
Google Scholar
Choi, C. J. et al. Newly discovered deep-branching marine plastid lineages are numerically rare but globally distributed. Curr. Biol. 27, R15–R16 (2017).
Google Scholar
Schön, M. E. et al. Single cell genomics reveals plastid-lacking Picozoa are close relatives of red algae. Nat. Commun. 12, 6651 (2021).
Google Scholar
Sunagawa, S. et al. Tara Oceans: towards global ocean ecosystems biology. Nat. Rev. Microbiol. 18, 428–445 (2020).
Google Scholar
Delmont, T. O. et al. Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean. Cell Genomics 2, 100123 (2022).
Google Scholar
Ruscheweyh, H.-J. et al. Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments. Microbiome 10, 212 (2022).
Google Scholar
Dmitrijeva, M. et al. The mOTUs online database provides web-accessible genomic context to taxonomic profiling of microbial communities. Nucleic Acids Res. 53, D797–D805 (2025).
Google Scholar
Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).
Google Scholar
Eren, A. M. et al. Anvi’o: an advanced analysis and visualization platform for’omics data. PeerJ 3, e1319 (2015).
Google Scholar
Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat. Microbiol. 6, 3–6 (2021).
Google Scholar
Kim, J. I. et al. Evolutionary dynamics of cryptophyte plastid genomes. Genome Biol. Evol. 9, 1859–1872 (2017).
Google Scholar
de Vargas, C. et al. Eukaryotic plankton diversity in the sunlit ocean. Science 348, 1261605 (2015).
Google Scholar
Muñoz-Gómez, S. A. & Slamovits, C. H. Plastid Genomes in the Myzozoa. in Advances in Botanical Research (eds Chaw, S.-M. & Jansen, R. K.) vol. 85 55–94 (Academic Press, 2018).Matsuo, E. et al. Comparative plastid genomics of green-colored dinoflagellates unveils parallel genome compaction and RNA editing. Front. Plant Sci. 13, 918543 (2022).
Google Scholar
Kamikawa, R. et al. Plastid genome-based phylogeny pinpointed the origin of the green-colored plastid in the dinoflagellate Lepidodinium chlorophorum. Genome Biol. Evol. 7, 1133–1140 (2015).
Google Scholar
de Vries, J. & Archibald, J. M. Plastid genomes. Curr. Biol. 28, R336–R337 (2018).
Google Scholar
Ha, J.-S. et al. Plastid genome evolution of two colony-forming benthic Ochrosphaera neapolitana Strains (Coccolithales, Haptophyta). Int. J. Mol. Sci. 24, 10485 (2023).
Google Scholar
Baños, H., Susko, E. & Roger, A. J. Is Over-parameterization a problem for profile mixture models? Syst. Biol. 73, 53–75 (2024).
Google Scholar
Szánthó, L. L., Lartillot, N., Szöllősi, G. J. & Schrempf, D. Compositionally constrained sites drive long-branch attraction. Syst. Biol. 72, 767–780 (2023).
Google Scholar
Williamson, K. et al. A robustly rooted tree of eukaryotes reveals their excavate ancestry. Nature 640, 974–981 (2025).
Google Scholar
Muñoz-Gómez, S. A. et al. Site-and-branch-heterogeneous analyses of an expanded dataset favour mitochondria as sister to known Alphaproteobacteria. Nat. Ecol. Evol. 6, 253–262 (2022).
Google Scholar
Crotty, S. M. et al. GHOST: recovering historical signal from heterotachously evolved sequence alignments. Syst. Biol. 69, 249–264 (2020).
Google Scholar
Susko, E. & Roger, A. J. On reduced amino acid alphabets for phylogenetic inference. Mol. Biol. Evol. 24, 2139–2150 (2007).
Google Scholar
Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010).
Google Scholar
Pierella Karlusich, J. J. et al. A robust approach to estimate relative phytoplankton cell abundances from metagenomes. Mol. Ecol. Resour. 23, 16–40 (2023).
Google Scholar
Hirakawa, Y. & Ishida, K.-I. Polyploidy of endosymbiotically derived genomes in complex algae. Genome Biol. Evol. 6, 974–980 (2014).
Google Scholar
Shrestha, B. et al. Global metagenomics reveals plastid diversity and unexplored algal lineages. bioRxiv 2025.03644651 https://doi.org/10.1101/2025.03.28.644651 (2025).Strassert, J. F. H., Irisarri, I., Williams, T. A. & Burki, F. A molecular timescale for eukaryote evolution with implications for the origin of red algal-derived plastids. Nat. Commun. 12, 1879 (2021).
Google Scholar
McFadden, G. I. The cryptomonad nucleomorph. Protoplasma 254, 1903–1907 (2017).
Google Scholar
Tanifuji, G. et al. Complete Nucleomorph Genome Sequence of the Nonphotosynthetic Alga Cryptomonas paramecium Reveals a Core Nucleomorph Gene Set. Genome Biol. Evol. 3, 44–54 (2011).
Google Scholar
Rice, D. W. & Palmer, J. D. An exceptional horizontal gene transfer in plastids: gene replacement by a distant bacterial paralog and evidence that haptophyte and cryptophyte plastids are sisters. BMC Biol 4, 31 (2006).
Google Scholar
Khan, H. et al. Plastid genome sequence of the cryptophyte alga Rhodomonas salina CCMP1319: lateral transfer of putative DNA replication machinery and a test of chromist plastid phylogeny. Mol. Biol. Evol. 24, 1832–1842 (2007).
Google Scholar
Burki, F., Roger, A. J., Brown, M. W. & Simpson, A. G. B. The new tree of eukaryotes. Trends Ecol. Evol. 35, 43–55 (2020).
Google Scholar
Ponce-Toledo, R. I., Moreira, D., López-García, P. & Deschamps, P. Molecular phylogeny of the SELMA translocation machinery recounts the evolution of complex photosynthetic eukaryotes. Mol. Biol. Evol. msaf167 https://doi.org/10.1093/molbev/msaf167 (2025).Pietluch, F., Mackiewicz, P., Ludwig, K. & Gagat, P. A new model and dating for the evolution of complex plastids of red alga origin. Genome Biol. Evol. 16, evae192 (2024).
Google Scholar
Gaïa, M. et al. Mirusviruses link herpesviruses to giant viruses. Nature 616, 783–789 (2023).
Google Scholar
Gaïa, M. et al. Closest relatives of poxviruses are spread in the gut of humans and animals worldwide: the egoviruses. 2024.03.23.586382 Preprint at https://doi.org/10.1101/2024.03.23.586382 (2025).Martinez-Gutierrez, C. A. & Aylward, F. O. Phylogenetic signal, congruence, and uncertainty across bacteria and archaea. Mol. Biol. Evol. 38, 5514–5527 (2021).
Google Scholar
Shaw, J. & Yu, Y. W. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nat. Methods 20, 1661–1665 (2023).
Google Scholar
Puerta, M. V. S., Bachvaroff, T. R. & Delwiche, C. F. The complete plastid genome sequence of the haptophyte emiliania huxleyi: a comparison to other plastid genomes. DNA Res 12, 151–156 (2005).
Google Scholar
Butenko, A., Lukeš, J., Speijer, D. & Wideman, J. G. Mitochondrial genomes revisited: why do different lineages retain different genes? BMC Biol. 22, 15 (2024).
Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Google Scholar
Lang, B. F. et al. Mitochondrial genome annotation with MFannot: a critical analysis of gene identification and gene model prediction. Front. Plant Sci. 14, 1222186 (2023).
Google Scholar
Greiner, S., Lehwark, P. & Bock, R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 47, W59–W64 (2019).
Google Scholar
Janouškovec, J., Horák, A., Oborník, M., Lukeš, J. & Keeling, P. J. A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc. Natl. Acad. Sci. USA 107, 10949–10954 (2010).
Google Scholar
Ponce-Toledo, R. I. et al. An Early-Branching Freshwater Cyanobacterium at the Origin of Plastids. Curr. Biol. 27, 386–391 (2017).
Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Google Scholar
Kozlov, A., Darriba, D., Flouri, T., Morel, B. & Stamatakis, A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 35, 4453–4455 (2019).
Google Scholar
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Google Scholar
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Google Scholar
Whelan, S., Irisarri, I. & Burki, F. PREQUAL: detecting non-homologous characters in sets of unaligned homologous sequences. Bioinformatics 34, 3929–3930 (2018).
Google Scholar
Susko, E., Lincker, L. & Roger, A. J. Accelerated estimation of frequency classes in site-heterogeneous profile mixture models. Mol. Biol. Evol. 35, 1266–1283 (2018).
Google Scholar
Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611–615 (2013).
Google Scholar
Liu, C. et al. The influence of the number of tree searches on maximum likelihood inference in phylogenomics. Syst. Biol. 73, 807–822 (2024).
Google Scholar
Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002).
Google Scholar
Yu, G. Using GGTREE to visualize data on tree-like structures. Curr Protoc Bioinform 69, e96 (2020).
Google Scholar
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238 (2019).
Google Scholar
Boyer, T. P. et al. World ocean database 2013. National Oceanographic Data Center (U.S.), Ocean Climate Laboratory https://doi.org/10.7289/V5NZ85MT (2013).Aumont, O., Ethé, C., Tagliabue, A., Bopp, L. & Gehlen, M. PISCES-v2: an ocean biogeochemical model for carbon and ecosystem studies. Geosci. Model Dev. 8, 2465–2513 (2015).
Google Scholar
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Google Scholar
Guillou, L. et al. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 41, D597–D604 (2013).
Google Scholar
Mahé, F., Henry, N., de Vargas, C., Tara Oceans Consortium, C. & Tara Oceans Expedition, P. rDNA 18S V9 metabarcoding tables (Swarm) for Tara Oceans Expedition (2009-2013), including Tara Polar Circle Expedition (2013). Zenodo (2022).Zavadska, D., Henry, N., Auladell, A., Berney, C. & Richter, D. J. Diverse patterns of correspondence between protist metabarcodes and protist metagenome-assembled genomes. PLOS ONE 19, e0303697 (2024).
Google Scholar
Wickham, H. Ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag, 2016).Aphalo, P. ggpmisc: Miscellaneous Extensions to ‘ggplot2’. (Springer, 2025).Shulgina, Y. & Eddy, S. R. Codetta: predicting the genetic code from nucleotide sequence. Bioinformatics 39, btac802 (2022).
Google Scholar
Jamy, M. Data for ‘Identification of a deep-branching lineage of algae using environmental plastid genomes’. figshare https://doi.org/10.17044/scilifelab.28212173 (2025).Jamy, M. Code for ‘Identification of a deep-branching lineage of algae using environmental plastid genomes’. Zenodo https://doi.org/10.5281/zenodo.17635604 (2025).Keeling, P. J. & Eglit, Y. Openly available illustrations as tools to describe eukaryotic microbial diversity. PLoS Biol. 21, e3002395 (2023).
Google Scholar
Download referencesAcknowledgementsWe thank Shinichi Sunagawa for having facilitated the recovery of relevant data from the mOTU metagenomic database maintained by his research group at the Department of Biology at ETH Zürich. We thank A. Roger, H. Baños, and C. McCarthey for discussions, and for kindly providing custom scripts for running the phylogenetic models MEOW and GF-MIX. We thank J.E. Dharamshi for discussions about phylogenetic analyses. Our survey was made possible by the sampling and sequencing efforts of the Tara Oceans Project. Tara Oceans (which includes the Tara Oceans and Tara Oceans Polar Circle expeditions) would not exist without the leadership of the Tara Oceans Foundation and the continuous support of 23 institutes (https://oceans.taraexpeditions.org/). This article is contribution number 164 of Tara Oceans. Phylogenetic analyses were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS 2024/5-197), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. M.J. was supported by the Swedish Research Council (International Postdoc grant 2022-00351). FB’s research is supported by grants from the European Research Council (ERC consolidator grant 101044505), the Swedish Research Council VR (2021-04055), and Science for Life Laboratory. TD’s research is supported by a grant from the l’Agence Nationale de la Recherche (ANR-23-CE02-0022). We also thank the commitment of the CNRS and Genoscope/CEA. Some of the computations were performed using the platine, titane and curie HPC machine provided through GENCI grants (t2011076389, t2012076389, t2013036389, t2014036389, t2015036389 and t2016036389).FundingOpen access funding provided by Uppsala University.Author informationAuthor notesEric Pelletier & Tom O. DelmontPresent address: Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOsee, Paris, FranceAuthors and AffiliationsDepartment of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, SwedenMahwash JamyDepartment of Organismal Biology, Program in Systematic Biology, Uppsala University, Uppsala, SwedenThomas Huber & Fabien BurkiGénomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, FranceThibault Antoine, Eric Pelletier & Tom O. DelmontResearch Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOsee, Paris, FranceThibault AntoineDepartment of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, SwitzerlandHans-Joachim RuscheweyhGlobal Health Institute, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, SwitzerlandLucas PaoliAuthorsMahwash JamyView author publicationsSearch author on:PubMed Google ScholarThomas HuberView author publicationsSearch author on:PubMed Google ScholarThibault AntoineView author publicationsSearch author on:PubMed Google ScholarHans-Joachim RuscheweyhView author publicationsSearch author on:PubMed Google ScholarLucas PaoliView author publicationsSearch author on:PubMed Google ScholarEric PelletierView author publicationsSearch author on:PubMed Google ScholarTom O. DelmontView author publicationsSearch author on:PubMed Google ScholarFabien BurkiView author publicationsSearch author on:PubMed Google ScholarContributionsF.B. and T.O.D. conceived the project. T.O.D. characterised the ptMAGs. M.J., F.B. and T.H., performed phylogenetic analyses. T.A., E.P. and T.O.D. created the plastid genomic database and performed surveys for nucleomorphs. H.J.R. and L.P. retrieved relevant data from mOTUs. M.J. and T.H. annotated the ptMAGs, and E.P. performed mapping analyses. F.B., M.J., and T.O.D. wrote the manuscript with input from all the authors.Corresponding authorsCorrespondence to
Tom O. Delmont or Fabien Burki.Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks John Archibald and the other anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Supplementary informationSupplementary InformationDescriptions of Additional Supplementary FilesSupplementary Dataset 1Supplementary Dataset 2Supplementary Dataset 3Supplementary Dataset 4Supplementary Dataset 5Supplementary Dataset 6Reporting SummaryTransparent Peer Review fileRights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and permissionsAbout this articleCite this articleJamy, M., Huber, T., Antoine, T. et al. Identification of a deep-branching lineage of algae using environmental plastid genomes.
Nat Commun (2025). https://doi.org/10.1038/s41467-025-67401-4Download citationReceived: 06 February 2025Accepted: 28 November 2025Published: 14 December 2025DOI: https://doi.org/10.1038/s41467-025-67401-4Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy shareable link to clipboard
Provided by the Springer Nature SharedIt content-sharing initiative More
