in

Chromosome-level genome assembly of the dwarf cattail Typha minima


Abstract

We report the first chromosome-level genome assembly of the critically endangered dwarf cattail, Typha minima, a wetland species of ecological and medicinal importance. Utilizing PacBio HiFi long-read sequencing and Hi-C scaffolding technologies, we generated a high-quality 324.66 Mb genome, anchored onto 30 pseudochromosomes. The assembly demonstrates exceptional continuity, with contig and scaffold N50 values of 10.84 Mb and 10.90 Mb, respectively, and a near-complete chromosomal anchoring rate of 99.65%. It exhibits outstanding completeness, as reflected by a BUSCO score of 99.2%, and contains 33.20% repetitive sequences. We annotated 34,541 protein-coding genes, with 96.42% receiving functional assignments. The assembly also includes annotations for non-coding RNAs, comprising 1,261 rRNAs, 230 miRNAs, and 467 tRNAs. Integrated orthology analysis identified 10,055 consensus orthologs across five functional databases. This high-quality genomic resource provides a foundation for advancing studies in evolutionary adaptation and conservation genomics of this endangered wetland plant.

Data availability

Raw sequence data have been deposited in the Genome Sequence Archive (GSA) at the National Genomics Data Center (NGDC) under BioProject accession number PRJCA04264653. The specific accessions for the genome survey, PacBio HiFi, Hi-C, RNA-seq, and Iso-Seq reads are CRA02755354, CRA02759555, CRA02757456, CRA02759757, and CRA02759858, respectively. The genome assembly has been deposited in the Genome Warehouse (GWH) under accession number GWHGEOG00000000.159. All raw data and the assembly are also available in the European Nucleotide Archive (ENA) under project PRJEB10298060, with the following accessions: ERR1586010262 (survey); ERR1586283663 (Iso-Seq); ERR1587448364 (HiFi); ERR1587363965 (Hi-C); ERR1587362066, ERR1587362167, ERR1587362268 and ERR1587362369 (RNA-seq reads).; and GCA_97706353561 (genome assembly).

Code availability

All bioinformatics tools and software used for genome assembly, annotation, and data analysis in this study were operated strictly according to their official user manuals, with no custom code employed. Software versions and parameters are comprehensively documented in the Methods section.

References

  1. Bansal, S. et al. Typha (Cattail) invasion in North American wetlands: biology, regional problems, impacts, ecosystem services, and management. Wetlands 39, 645–684 (2019).

    Google Scholar 

  2. Carpenter, S. R. & Lodge, D. M. Effects of submersed macrophytes on ecosystem processes. Aquatic Botany 26, 341–370 (1986).

    Google Scholar 

  3. Lewis, M. & Thursby, G. Aquatic plants: Test species sensitivity and minimum data requirement evaluations for chemical risk assessments and aquatic life criteria development for the USA. Environ. Pollut. 238, 270–280 (2018).

    Google Scholar 

  4. Thomaz, S. M. & Cunha, E. R. The role of macrophytes in habitat structuring in aquatic ecosystems: methods of measurement, causes and consequences on animal assemblages’ composition and biodiversity. Acta Limnol. Bras. 22, 218–236 (2010).

    Google Scholar 

  5. Alufasi, R. et al. Internalisation of Salmonella spp. by Typha latifolia and Cyperus papyrus in vitro and implications for pathogen removal in Constructed Wetlands. Environ. Technol. 43, 949–961 (2022).

    Google Scholar 

  6. National Pharmacopoeia Commission. Pharmacopoeia of the People’s Republic of China (2020 edition): Volume I. China Medical Science Press, Beijing (2020).

  7. Smith, S. G. Typha: its taxonomy and the ecological significance of hybrids. Arch. Hydrobiol 27, 129–138 (1987).

    Google Scholar 

  8. Csencsics, D. et al. La petite massette: Habitant menacé d’un biotope rare. Notice pour le praticien 43. Institut fédéral de recherches WSL, Birmensdorf (2008).

  9. Zhou, B. et al. Revised phylogeny and historical biogeography of the cosmopolitan aquatic plant genus Typha (Typhaceae). Sci. Rep. 8, 8813 (2018).

    Google Scholar 

  10. Liao, Y. et al. Chromosome-level genome and high nitrogen stress response of the widespread and ecologically important wetland plant Typha angustifolia. Front. Plant Sci. 14, 1138498 (2023).

    Google Scholar 

  11. Widanagama, S. D., Freeland, J. R., Xu, X. & Shafer, A. B. Genome assembly, annotation, and comparative analysis of the cattail Typha latifolia. G3: Genes, Genomes, Genetics 12(2), jkab401 (2022).

    Google Scholar 

  12. Aleman, A. et al. Development of genomic resources for cattails (Typha), a globally important macrophyte genus. Freshwater Biology 69(1), 74–83 (2024).

    Google Scholar 

  13. Allen, G. C., Flores-Vergara, M. A., Krasynanski, S., Kumar, S. & Thompson, W. F. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 1, 2320–2325 (2006).

    Google Scholar 

  14. Arita, M., Karsch-Mizrachi, I. & Cochrane, G. The international nucleotide sequence database collaboration. Nucleic Acids Res. 49, D121–D124 (2021).

    Google Scholar 

  15. Camacho, C. et al. BLAST+ : architecture and applications. BMC Bioinformatics 10, 421 (2009).

    Google Scholar 

  16. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).

    Google Scholar 

  17. Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).

    Google Scholar 

  18. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature Communications. 11(1), 1432 (2020).

    Google Scholar 

  19. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with Hifiasm. Nat. Methods 18, 170–175 (2021).

    Google Scholar 

  20. Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).

    Google Scholar 

  21. Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).

    Google Scholar 

  22. Gertz, E. M., Yu, Y. K., Agarwala, R., Schäffer, A. A. & Altschul, S. F. Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST. BMC Biol. 4, 41 (2006).

    Google Scholar 

  23. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Combining gene prediction methods with alignment information in the AUGUSTUS gene finder. Bioinformatics 22, 417–425 (2006).

    Google Scholar 

  24. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).

    Google Scholar 

  25. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25(14), 1754–1760 (2009).

    Google Scholar 

  26. Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).

    Google Scholar 

  27. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Google Scholar 

  28. Marc van Dijk, M. & Bonvin, A. M. 3D-DART: a DNA structure modelling server. Nucleic Acids Res. 37, W235–W239 (2009).

    Google Scholar 

  29. Robinson, J. T. et al. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst. 6, 256–258.e1 (2018).

    Google Scholar 

  30. Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 48, W177–W184 (2020).

    Google Scholar 

  31. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).

    Google Scholar 

  32. Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199 (2010).

    Google Scholar 

  33. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 18 (2008).

    Google Scholar 

  34. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).

    Google Scholar 

  35. Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).

    Google Scholar 

  36. Tempel, S., Jurka, M. & Jurka, J. VisualRepbase: an interface for the study of occurrences of transposable element families. BMC Bioinform. 9, 345 (2008).

    Google Scholar 

  37. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 25, 4.10.1–4.10.14 (2009).

    Google Scholar 

  38. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 117, 9451–9457 (2020).

    Google Scholar 

  39. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).

    Google Scholar 

  40. Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).

    Google Scholar 

  41. Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2003).

    Google Scholar 

  42. Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol. Biol. 1962, 161–177 (2019).

    Google Scholar 

  43. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).

    Google Scholar 

  44. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).

    Google Scholar 

  45. Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).

    Google Scholar 

  46. Bruna, T., Lomsadze, A. & Borodovsky, M. A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes. bioRxiv 2023.01.13.524024 (2024).

  47. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

    Google Scholar 

  48. Sayers, E. W. et al. GenBank. Nucleic Acids Res. 49, D92–D96 (2021).

    Google Scholar 

  49. The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).

    Google Scholar 

  50. Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).

    Google Scholar 

  51. The Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2021).

    Google Scholar 

  52. Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M. & Tanabe, M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 49, D545–D551 (2021).

    Google Scholar 

  53. NGDC BioProject https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA042646 (2024).

  54. Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027553 (2024).

  55. Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027595 (2024).

  56. Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027574 (2024).

  57. Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027597 (2024).

  58. Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027598 (2024).

  59. NGDC Genome Warehouse https://ngdc.cncb.ac.cn/gwh/Assembly/98240/show (2024).

  60. European Nucleotide Archive https://identifiers.org/ena.embl:PRJEB102980 (2025).

  61. European Nucleotide Archive https://identifiers.org/insdc.gca:GCA_977063535 (2025).

  62. European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15860102 (2025).

  63. European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15862836 (2025).

  64. European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15874483 (2025).

  65. European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873639 (2025).

  66. European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873620 (2025).

  67. European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873621 (2025).

  68. European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873622 (2025).

  69. European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873623 (2025).

Download references

Acknowledgements

This study was supported by the National Water Pollution Control and Treatment Science and Technology Major Project, China (No. 2015ZX07503005). The calculations in this paper were performed using the supercomputing system at the Supercomputing Center of Wuhan University.

Author information

Authors and Affiliations

Authors

Contributions

X.X. designed the research; L.H. carried out the field collections; J.D. carried out the experiments and performed the data analysis; J.D., L.H. and X.X. wrote and revised the manuscript. All authors read and approved the manuscript.

Corresponding author

Correspondence to
Xinwei Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Du, J., Huang, L. & Xu, X. Chromosome-level genome assembly of the dwarf cattail Typha minima.
Sci Data (2026). https://doi.org/10.1038/s41597-026-06547-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41597-026-06547-2


Source: Ecology - nature.com

3 Questions: How AI could optimize the power grid

Reimagining coral reef futures

Back to Top