in

Identification of a deep-branching lineage of algae using environmental plastid genomes

Abstract

Marine algae underpin entire ocean ecosystems. Yet algae in culture poorly represent their large environmental diversity, and we have a limited understanding of their convoluted evolution by endosymbiosis. Here, we perform a phylogeny-guided plastid genome-resolved metagenomic survey of Tara Oceans expeditions. We present a curated resource of 660 new non-redundant plastid genomes of environmental marine algae, vastly expanding plastid genome diversity within major algal groups, including many without closely related reference genomes. Notably, we recover four plastid genomes, including one near-complete, forming a deep-branching plastid lineage of nano-size algae that we informally name leptophytes. This group is globally distributed and generally rare, although it can reach relatively high abundance in the Arctic. A near-complete mitochondrial genome showing strong co-occurrence with leptophyte plastids is also recovered and assigned to this group. Leptophytes encompass the enigmatic plastid group DPL2, one of the very few known plastid groups not clearly belonging to major algal groups and previously known only from 16S rDNA sequences. Comparative organellar genomics and phylogenomics indicate that leptophytes are sister to haptophytes, and raise the intriguing possibility that cryptophytes acquired their plastids from haptophytes. Collectively, our study demonstrates that metagenomics can reveal hidden organellar diversity, and improve models of plastid evolution.

Data availability

The 937 metagenomes from Tara Oceans used in the study are publicly available at the EBI under project PRJEB402. Data our study generated has been deposited in an online repository: https://doi.org/10.17044/scilifelab.2821217381. This link provides access to the individual FASTA files from each plastid and mitochondrial genome used in our study (including the 660 non-redundant ptMAGs and 34 mtMAGs), the co-assembly of the top six samples where Lepto-01 was most abundant, individual gene alignments, concatenated and trimmed alignments, and maximum-likelihood and Bayesian tree files for the phylogenomic dataset. Source Data for Fig. 4, Supplementary Figs. 10-12, and Supplementary Fig. 22 can be found on the linked GitHub repository, while source data for Supplementary Figs. 2-6 is provided as Supplementary Data 1.

Code availability

All scripts used for genome annotation and phylogenetic analyses are available on GitHub: https://github.com/burki-lab/ptMAGs with the identifier: 10.5281/zenodo.1763560482.

References

  1. Andersen, R. A. Diversity of eukaryotic algae. Biodivers. Conserv. 1, 267–292 (1992).

    Google Scholar 

  2. Archibald, J. M. The puzzle of plastid evolution. Curr. Biol. 19, R81–R88 (2009).

    Google Scholar 

  3. Irisarri, I., Strassert, J. F. H. & Burki, F. Phylogenomic insights into the origin of primary plastids. Syst. Biol. 71, 105–120 (2022).

    Google Scholar 

  4. Keeling, P. J. The endosymbiotic origin, diversification and fate of plastids. Philos. Trans. R. Soc. B Biol. Sci. 365, 729–748 (2010).

    Google Scholar 

  5. Penot, M., Dacks, J. B., Read, B. & Dorrell, R. G. Genomic and meta-genomic insights into the functions, diversity and global distribution of haptophyte algae. Appl. Phycol. 3, 340–359 (2022).

    Google Scholar 

  6. Malviya, S. et al. Insights into global diatom distribution and diversity in the world’s ocean. Proc. Natl. Acad. Sci. USA 113, E1516–E1525 (2016).

    Google Scholar 

  7. Dorrell, R. G. et al. Chimeric origins of ochrophytes and haptophytes revealed through an ancient plastid proteome. eLife 6, e23717 (2017).

    Google Scholar 

  8. Stiller, J. W. et al. The evolution of photosynthesis in chromist algae through serial endosymbioses. Nat. Commun. 5, 5764 (2014).

    Google Scholar 

  9. Ševčíková, T. et al. Updating algal evolutionary relationships through plastid genome sequencing: did alveolate plastids emerge through endosymbiosis of an ochrophyte? Sci. Rep. 5, 10134 (2015).

    Google Scholar 

  10. Keeling, P. J. et al. The marine microbial eukaryote transcriptome sequencing project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLOS Biol 12, e1001889 (2014).

    Google Scholar 

  11. Moreira, D. & López-García, P. The rise and fall of Picobiliphytes: how assumed autotrophs turned out to be heterotrophs. BioEssays 36, 468–474 (2014).

    Google Scholar 

  12. Kawachi, M. et al. Rappemonads are haptophyte phytoplankton. Curr. Biol. 31, 2395–2403 (2021).

    Google Scholar 

  13. Choi, C. J. et al. Newly discovered deep-branching marine plastid lineages are numerically rare but globally distributed. Curr. Biol. 27, R15–R16 (2017).

    Google Scholar 

  14. Schön, M. E. et al. Single cell genomics reveals plastid-lacking Picozoa are close relatives of red algae. Nat. Commun. 12, 6651 (2021).

    Google Scholar 

  15. Sunagawa, S. et al. Tara Oceans: towards global ocean ecosystems biology. Nat. Rev. Microbiol. 18, 428–445 (2020).

    Google Scholar 

  16. Delmont, T. O. et al. Functional repertoire convergence of distantly related eukaryotic plankton lineages abundant in the sunlit ocean. Cell Genomics 2, 100123 (2022).

    Google Scholar 

  17. Ruscheweyh, H.-J. et al. Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments. Microbiome 10, 212 (2022).

    Google Scholar 

  18. Dmitrijeva, M. et al. The mOTUs online database provides web-accessible genomic context to taxonomic profiling of microbial communities. Nucleic Acids Res. 53, D797–D805 (2025).

    Google Scholar 

  19. Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).

    Google Scholar 

  20. Eren, A. M. et al. Anvi’o: an advanced analysis and visualization platform for’omics data. PeerJ 3, e1319 (2015).

    Google Scholar 

  21. Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat. Microbiol. 6, 3–6 (2021).

    Google Scholar 

  22. Kim, J. I. et al. Evolutionary dynamics of cryptophyte plastid genomes. Genome Biol. Evol. 9, 1859–1872 (2017).

    Google Scholar 

  23. de Vargas, C. et al. Eukaryotic plankton diversity in the sunlit ocean. Science 348, 1261605 (2015).

    Google Scholar 

  24. Muñoz-Gómez, S. A. & Slamovits, C. H. Plastid Genomes in the Myzozoa. in Advances in Botanical Research (eds Chaw, S.-M. & Jansen, R. K.) vol. 85 55–94 (Academic Press, 2018).

  25. Matsuo, E. et al. Comparative plastid genomics of green-colored dinoflagellates unveils parallel genome compaction and RNA editing. Front. Plant Sci. 13, 918543 (2022).

    Google Scholar 

  26. Kamikawa, R. et al. Plastid genome-based phylogeny pinpointed the origin of the green-colored plastid in the dinoflagellate Lepidodinium chlorophorum. Genome Biol. Evol. 7, 1133–1140 (2015).

    Google Scholar 

  27. de Vries, J. & Archibald, J. M. Plastid genomes. Curr. Biol. 28, R336–R337 (2018).

    Google Scholar 

  28. Ha, J.-S. et al. Plastid genome evolution of two colony-forming benthic Ochrosphaera neapolitana Strains (Coccolithales, Haptophyta). Int. J. Mol. Sci. 24, 10485 (2023).

    Google Scholar 

  29. Baños, H., Susko, E. & Roger, A. J. Is Over-parameterization a problem for profile mixture models? Syst. Biol. 73, 53–75 (2024).

    Google Scholar 

  30. Szánthó, L. L., Lartillot, N., Szöllősi, G. J. & Schrempf, D. Compositionally constrained sites drive long-branch attraction. Syst. Biol. 72, 767–780 (2023).

    Google Scholar 

  31. Williamson, K. et al. A robustly rooted tree of eukaryotes reveals their excavate ancestry. Nature 640, 974–981 (2025).

    Google Scholar 

  32. Muñoz-Gómez, S. A. et al. Site-and-branch-heterogeneous analyses of an expanded dataset favour mitochondria as sister to known Alphaproteobacteria. Nat. Ecol. Evol. 6, 253–262 (2022).

    Google Scholar 

  33. Crotty, S. M. et al. GHOST: recovering historical signal from heterotachously evolved sequence alignments. Syst. Biol. 69, 249–264 (2020).

    Google Scholar 

  34. Susko, E. & Roger, A. J. On reduced amino acid alphabets for phylogenetic inference. Mol. Biol. Evol. 24, 2139–2150 (2007).

    Google Scholar 

  35. Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010).

    Google Scholar 

  36. Pierella Karlusich, J. J. et al. A robust approach to estimate relative phytoplankton cell abundances from metagenomes. Mol. Ecol. Resour. 23, 16–40 (2023).

    Google Scholar 

  37. Hirakawa, Y. & Ishida, K.-I. Polyploidy of endosymbiotically derived genomes in complex algae. Genome Biol. Evol. 6, 974–980 (2014).

    Google Scholar 

  38. Shrestha, B. et al. Global metagenomics reveals plastid diversity and unexplored algal lineages. bioRxiv 2025.03644651 https://doi.org/10.1101/2025.03.28.644651 (2025).

  39. Strassert, J. F. H., Irisarri, I., Williams, T. A. & Burki, F. A molecular timescale for eukaryote evolution with implications for the origin of red algal-derived plastids. Nat. Commun. 12, 1879 (2021).

    Google Scholar 

  40. McFadden, G. I. The cryptomonad nucleomorph. Protoplasma 254, 1903–1907 (2017).

    Google Scholar 

  41. Tanifuji, G. et al. Complete Nucleomorph Genome Sequence of the Nonphotosynthetic Alga Cryptomonas paramecium Reveals a Core Nucleomorph Gene Set. Genome Biol. Evol. 3, 44–54 (2011).

    Google Scholar 

  42. Rice, D. W. & Palmer, J. D. An exceptional horizontal gene transfer in plastids: gene replacement by a distant bacterial paralog and evidence that haptophyte and cryptophyte plastids are sisters. BMC Biol 4, 31 (2006).

    Google Scholar 

  43. Khan, H. et al. Plastid genome sequence of the cryptophyte alga Rhodomonas salina CCMP1319: lateral transfer of putative DNA replication machinery and a test of chromist plastid phylogeny. Mol. Biol. Evol. 24, 1832–1842 (2007).

    Google Scholar 

  44. Burki, F., Roger, A. J., Brown, M. W. & Simpson, A. G. B. The new tree of eukaryotes. Trends Ecol. Evol. 35, 43–55 (2020).

    Google Scholar 

  45. Ponce-Toledo, R. I., Moreira, D., López-García, P. & Deschamps, P. Molecular phylogeny of the SELMA translocation machinery recounts the evolution of complex photosynthetic eukaryotes. Mol. Biol. Evol. msaf167 https://doi.org/10.1093/molbev/msaf167 (2025).

  46. Pietluch, F., Mackiewicz, P., Ludwig, K. & Gagat, P. A new model and dating for the evolution of complex plastids of red alga origin. Genome Biol. Evol. 16, evae192 (2024).

    Google Scholar 

  47. Gaïa, M. et al. Mirusviruses link herpesviruses to giant viruses. Nature 616, 783–789 (2023).

    Google Scholar 

  48. Gaïa, M. et al. Closest relatives of poxviruses are spread in the gut of humans and animals worldwide: the egoviruses. 2024.03.23.586382 Preprint at https://doi.org/10.1101/2024.03.23.586382 (2025).

  49. Martinez-Gutierrez, C. A. & Aylward, F. O. Phylogenetic signal, congruence, and uncertainty across bacteria and archaea. Mol. Biol. Evol. 38, 5514–5527 (2021).

    Google Scholar 

  50. Shaw, J. & Yu, Y. W. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nat. Methods 20, 1661–1665 (2023).

    Google Scholar 

  51. Puerta, M. V. S., Bachvaroff, T. R. & Delwiche, C. F. The complete plastid genome sequence of the haptophyte emiliania huxleyi: a comparison to other plastid genomes. DNA Res 12, 151–156 (2005).

    Google Scholar 

  52. Butenko, A., Lukeš, J., Speijer, D. & Wideman, J. G. Mitochondrial genomes revisited: why do different lineages retain different genes? BMC Biol. 22, 15 (2024).

    Google Scholar 

  53. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Google Scholar 

  54. Lang, B. F. et al. Mitochondrial genome annotation with MFannot: a critical analysis of gene identification and gene model prediction. Front. Plant Sci. 14, 1222186 (2023).

    Google Scholar 

  55. Greiner, S., Lehwark, P. & Bock, R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 47, W59–W64 (2019).

    Google Scholar 

  56. Janouškovec, J., Horák, A., Oborník, M., Lukeš, J. & Keeling, P. J. A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc. Natl. Acad. Sci. USA 107, 10949–10954 (2010).

    Google Scholar 

  57. Ponce-Toledo, R. I. et al. An Early-Branching Freshwater Cyanobacterium at the Origin of Plastids. Curr. Biol. 27, 386–391 (2017).

    Google Scholar 

  58. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Google Scholar 

  59. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

    Google Scholar 

  60. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).

    Google Scholar 

  61. Kozlov, A., Darriba, D., Flouri, T., Morel, B. & Stamatakis, A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 35, 4453–4455 (2019).

    Google Scholar 

  62. Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).

    Google Scholar 

  63. Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).

    Google Scholar 

  64. Whelan, S., Irisarri, I. & Burki, F. PREQUAL: detecting non-homologous characters in sets of unaligned homologous sequences. Bioinformatics 34, 3929–3930 (2018).

    Google Scholar 

  65. Susko, E., Lincker, L. & Roger, A. J. Accelerated estimation of frequency classes in site-heterogeneous profile mixture models. Mol. Biol. Evol. 35, 1266–1283 (2018).

    Google Scholar 

  66. Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611–615 (2013).

    Google Scholar 

  67. Liu, C. et al. The influence of the number of tree searches on maximum likelihood inference in phylogenomics. Syst. Biol. 73, 807–822 (2024).

    Google Scholar 

  68. Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002).

    Google Scholar 

  69. Yu, G. Using GGTREE to visualize data on tree-like structures. Curr Protoc Bioinform 69, e96 (2020).

    Google Scholar 

  70. Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Google Scholar 

  71. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238 (2019).

    Google Scholar 

  72. Boyer, T. P. et al. World ocean database 2013. National Oceanographic Data Center (U.S.), Ocean Climate Laboratory https://doi.org/10.7289/V5NZ85MT (2013).

  73. Aumont, O., Ethé, C., Tagliabue, A., Bopp, L. & Gehlen, M. PISCES-v2: an ocean biogeochemical model for carbon and ecosystem studies. Geosci. Model Dev. 8, 2465–2513 (2015).

    Google Scholar 

  74. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

    Google Scholar 

  75. Guillou, L. et al. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 41, D597–D604 (2013).

    Google Scholar 

  76. Mahé, F., Henry, N., de Vargas, C., Tara Oceans Consortium, C. & Tara Oceans Expedition, P. rDNA 18S V9 metabarcoding tables (Swarm) for Tara Oceans Expedition (2009-2013), including Tara Polar Circle Expedition (2013). Zenodo (2022).

  77. Zavadska, D., Henry, N., Auladell, A., Berney, C. & Richter, D. J. Diverse patterns of correspondence between protist metabarcodes and protist metagenome-assembled genomes. PLOS ONE 19, e0303697 (2024).

    Google Scholar 

  78. Wickham, H. Ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag, 2016).

  79. Aphalo, P. ggpmisc: Miscellaneous Extensions to ‘ggplot2’. (Springer, 2025).

  80. Shulgina, Y. & Eddy, S. R. Codetta: predicting the genetic code from nucleotide sequence. Bioinformatics 39, btac802 (2022).

    Google Scholar 

  81. Jamy, M. Data for ‘Identification of a deep-branching lineage of algae using environmental plastid genomes’. figshare https://doi.org/10.17044/scilifelab.28212173 (2025).

  82. Jamy, M. Code for ‘Identification of a deep-branching lineage of algae using environmental plastid genomes’. Zenodo https://doi.org/10.5281/zenodo.17635604 (2025).

  83. Keeling, P. J. & Eglit, Y. Openly available illustrations as tools to describe eukaryotic microbial diversity. PLoS Biol. 21, e3002395 (2023).

    Google Scholar 

Download references

Acknowledgements

We thank Shinichi Sunagawa for having facilitated the recovery of relevant data from the mOTU metagenomic database maintained by his research group at the Department of Biology at ETH Zürich. We thank A. Roger, H. Baños, and C. McCarthey for discussions, and for kindly providing custom scripts for running the phylogenetic models MEOW and GF-MIX. We thank J.E. Dharamshi for discussions about phylogenetic analyses. Our survey was made possible by the sampling and sequencing efforts of the Tara Oceans Project. Tara Oceans (which includes the Tara Oceans and Tara Oceans Polar Circle expeditions) would not exist without the leadership of the Tara Oceans Foundation and the continuous support of 23 institutes (https://oceans.taraexpeditions.org/). This article is contribution number 164 of Tara Oceans. Phylogenetic analyses were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS 2024/5-197), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. M.J. was supported by the Swedish Research Council (International Postdoc grant 2022-00351). FB’s research is supported by grants from the European Research Council (ERC consolidator grant 101044505), the Swedish Research Council VR (2021-04055), and Science for Life Laboratory. TD’s research is supported by a grant from the l’Agence Nationale de la Recherche (ANR-23-CE02-0022). We also thank the commitment of the CNRS and Genoscope/CEA. Some of the computations were performed using the platine, titane and curie HPC machine provided through GENCI grants (t2011076389, t2012076389, t2013036389, t2014036389, t2015036389 and t2016036389).

Funding

Open access funding provided by Uppsala University.

Author information

Authors and Affiliations

Authors

Contributions

F.B. and T.O.D. conceived the project. T.O.D. characterised the ptMAGs. M.J., F.B. and T.H., performed phylogenetic analyses. T.A., E.P. and T.O.D. created the plastid genomic database and performed surveys for nucleomorphs. H.J.R. and L.P. retrieved relevant data from mOTUs. M.J. and T.H. annotated the ptMAGs, and E.P. performed mapping analyses. F.B., M.J., and T.O.D. wrote the manuscript with input from all the authors.

Corresponding authors

Correspondence to
Tom O. Delmont or Fabien Burki.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks John Archibald and the other anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Descriptions of Additional Supplementary Files

Supplementary Dataset 1

Supplementary Dataset 2

Supplementary Dataset 3

Supplementary Dataset 4

Supplementary Dataset 5

Supplementary Dataset 6

Reporting Summary

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jamy, M., Huber, T., Antoine, T. et al. Identification of a deep-branching lineage of algae using environmental plastid genomes.
Nat Commun (2025). https://doi.org/10.1038/s41467-025-67401-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41467-025-67401-4


Source: Ecology - nature.com

High-frequency observations during Adriatic mucilage event reveal unique phytoplankton traits and diversity response

The soil microbiome as an indicator of ecosystem multifunctionality in European soils

Back to Top