Abstract
We report the first chromosome-level genome assembly of the critically endangered dwarf cattail, Typha minima, a wetland species of ecological and medicinal importance. Utilizing PacBio HiFi long-read sequencing and Hi-C scaffolding technologies, we generated a high-quality 324.66 Mb genome, anchored onto 30 pseudochromosomes. The assembly demonstrates exceptional continuity, with contig and scaffold N50 values of 10.84 Mb and 10.90 Mb, respectively, and a near-complete chromosomal anchoring rate of 99.65%. It exhibits outstanding completeness, as reflected by a BUSCO score of 99.2%, and contains 33.20% repetitive sequences. We annotated 34,541 protein-coding genes, with 96.42% receiving functional assignments. The assembly also includes annotations for non-coding RNAs, comprising 1,261 rRNAs, 230 miRNAs, and 467 tRNAs. Integrated orthology analysis identified 10,055 consensus orthologs across five functional databases. This high-quality genomic resource provides a foundation for advancing studies in evolutionary adaptation and conservation genomics of this endangered wetland plant.
Data availability
Raw sequence data have been deposited in the Genome Sequence Archive (GSA) at the National Genomics Data Center (NGDC) under BioProject accession number PRJCA04264653. The specific accessions for the genome survey, PacBio HiFi, Hi-C, RNA-seq, and Iso-Seq reads are CRA02755354, CRA02759555, CRA02757456, CRA02759757, and CRA02759858, respectively. The genome assembly has been deposited in the Genome Warehouse (GWH) under accession number GWHGEOG00000000.159. All raw data and the assembly are also available in the European Nucleotide Archive (ENA) under project PRJEB10298060, with the following accessions: ERR1586010262 (survey); ERR1586283663 (Iso-Seq); ERR1587448364 (HiFi); ERR1587363965 (Hi-C); ERR1587362066, ERR1587362167, ERR1587362268 and ERR1587362369 (RNA-seq reads).; and GCA_97706353561 (genome assembly).
Code availability
All bioinformatics tools and software used for genome assembly, annotation, and data analysis in this study were operated strictly according to their official user manuals, with no custom code employed. Software versions and parameters are comprehensively documented in the Methods section.
References
Bansal, S. et al. Typha (Cattail) invasion in North American wetlands: biology, regional problems, impacts, ecosystem services, and management. Wetlands 39, 645–684 (2019).
Carpenter, S. R. & Lodge, D. M. Effects of submersed macrophytes on ecosystem processes. Aquatic Botany 26, 341–370 (1986).
Lewis, M. & Thursby, G. Aquatic plants: Test species sensitivity and minimum data requirement evaluations for chemical risk assessments and aquatic life criteria development for the USA. Environ. Pollut. 238, 270–280 (2018).
Thomaz, S. M. & Cunha, E. R. The role of macrophytes in habitat structuring in aquatic ecosystems: methods of measurement, causes and consequences on animal assemblages’ composition and biodiversity. Acta Limnol. Bras. 22, 218–236 (2010).
Alufasi, R. et al. Internalisation of Salmonella spp. by Typha latifolia and Cyperus papyrus in vitro and implications for pathogen removal in Constructed Wetlands. Environ. Technol. 43, 949–961 (2022).
National Pharmacopoeia Commission. Pharmacopoeia of the People’s Republic of China (2020 edition): Volume I. China Medical Science Press, Beijing (2020).
Smith, S. G. Typha: its taxonomy and the ecological significance of hybrids. Arch. Hydrobiol 27, 129–138 (1987).
Csencsics, D. et al. La petite massette: Habitant menacé d’un biotope rare. Notice pour le praticien 43. Institut fédéral de recherches WSL, Birmensdorf (2008).
Zhou, B. et al. Revised phylogeny and historical biogeography of the cosmopolitan aquatic plant genus Typha (Typhaceae). Sci. Rep. 8, 8813 (2018).
Liao, Y. et al. Chromosome-level genome and high nitrogen stress response of the widespread and ecologically important wetland plant Typha angustifolia. Front. Plant Sci. 14, 1138498 (2023).
Widanagama, S. D., Freeland, J. R., Xu, X. & Shafer, A. B. Genome assembly, annotation, and comparative analysis of the cattail Typha latifolia. G3: Genes, Genomes, Genetics 12(2), jkab401 (2022).
Aleman, A. et al. Development of genomic resources for cattails (Typha), a globally important macrophyte genus. Freshwater Biology 69(1), 74–83 (2024).
Allen, G. C., Flores-Vergara, M. A., Krasynanski, S., Kumar, S. & Thompson, W. F. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat. Protoc. 1, 2320–2325 (2006).
Arita, M., Karsch-Mizrachi, I. & Cochrane, G. The international nucleotide sequence database collaboration. Nucleic Acids Res. 49, D121–D124 (2021).
Camacho, C. et al. BLAST+ : architecture and applications. BMC Bioinformatics 10, 421 (2009).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature Communications. 11(1), 1432 (2020).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with Hifiasm. Nat. Methods 18, 170–175 (2021).
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Gertz, E. M., Yu, Y. K., Agarwala, R., Schäffer, A. A. & Altschul, S. F. Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST. BMC Biol. 4, 41 (2006).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Combining gene prediction methods with alignment information in the AUGUSTUS gene finder. Bioinformatics 22, 417–425 (2006).
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25(14), 1754–1760 (2009).
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Marc van Dijk, M. & Bonvin, A. M. 3D-DART: a DNA structure modelling server. Nucleic Acids Res. 37, W235–W239 (2009).
Robinson, J. T. et al. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst. 6, 256–258.e1 (2018).
Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 48, W177–W184 (2020).
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199 (2010).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 18 (2008).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
Tempel, S., Jurka, M. & Jurka, J. VisualRepbase: an interface for the study of occurrences of transposable element families. BMC Bioinform. 9, 345 (2008).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 25, 4.10.1–4.10.14 (2009).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 117, 9451–9457 (2020).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2003).
Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol. Biol. 1962, 161–177 (2019).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Bruna, T., Lomsadze, A. & Borodovsky, M. A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes. bioRxiv 2023.01.13.524024 (2024).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Sayers, E. W. et al. GenBank. Nucleic Acids Res. 49, D92–D96 (2021).
The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
The Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2021).
Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M. & Tanabe, M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 49, D545–D551 (2021).
NGDC BioProject https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA042646 (2024).
Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027553 (2024).
Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027595 (2024).
Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027574 (2024).
Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027597 (2024).
Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA027598 (2024).
NGDC Genome Warehouse https://ngdc.cncb.ac.cn/gwh/Assembly/98240/show (2024).
European Nucleotide Archive https://identifiers.org/ena.embl:PRJEB102980 (2025).
European Nucleotide Archive https://identifiers.org/insdc.gca:GCA_977063535 (2025).
European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15860102 (2025).
European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15862836 (2025).
European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15874483 (2025).
European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873639 (2025).
European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873620 (2025).
European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873621 (2025).
European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873622 (2025).
European Nucleotide Archive https://identifiers.org/insdc.sra:ERR15873623 (2025).
Acknowledgements
This study was supported by the National Water Pollution Control and Treatment Science and Technology Major Project, China (No. 2015ZX07503005). The calculations in this paper were performed using the supercomputing system at the Supercomputing Center of Wuhan University.
Author information
Authors and Affiliations
Contributions
X.X. designed the research; L.H. carried out the field collections; J.D. carried out the experiments and performed the data analysis; J.D., L.H. and X.X. wrote and revised the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Tables
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
About this article
Cite this article
Du, J., Huang, L. & Xu, X. Chromosome-level genome assembly of the dwarf cattail Typha minima.
Sci Data (2026). https://doi.org/10.1038/s41597-026-06547-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-06547-2
Source: Ecology - nature.com

