in

Telomere-to-Telomere genome assembly of an endangered tree Berchemiella wilsonii (Rhamnaceae)


Abstract

Berchemiella wilsonii (Schneid.) Nakai is one of the National Key Protected Wild Plants in China, and one of the 120 Plant Species with Extremely Small Populations in China. By combining PacBio HiFi sequencing, DNBSEQ sequencing and Hi-C sequencing, we have assembled a chromosome-scale, haplotype-resolved genome for B. wilsonii. The final assembled haplotype A and haplotype B genomes were 216.84 Mb and 217.69 Mb, anchored to 2n = 24 chromosomes, all chromosomal ends contain telomeric characteristic motifs (TTTAGGG), only haplotype A has one gap, both close to a T2T genome assembly, with contig N50 lengths of 18.24 Mb and 18.03 Mb, respectively. Further, BUSCO analysis showed an extremely high assembly completeness (98.9% complete BUSCO genes). The genome contains a total of 22,828 coding genes, of which 21,992 (96.34%) were functionally annotated. This is the first report of the genome of B. wilsonii and the high-quality genome will provide new insights into evolutionary history and taxonomic classification challenges of endangered plant species.

Similar content being viewed by others

A nearly telomere-to-telomere diploid genome assembly of Firmiana kwangsiensis, a threatened species in China

Near telomere-to-telomere (T2T) level genome assembly of the critically endangered plant Magnolia zenii (Magnoliaceae)

Chromosome-level and haplotype-resolved genome assembly of Bougainvillea glabra

Data availability

The raw date of PacBio HiFi, DNBSEQ and Hi-C sequencing data generated in this study have been deposited at the NCBI Sequence Read Archive (SRA) under the BioProject number of PRJNA1229120. The accession numbers of PacBio HiFi, DNBSEQ and Hi-C sequencing data are publicly accessible as SRR3577978367, SRR3577978568, and SRR3577978469, respectively. Furthermore, the raw data were also deposited at the National Genomics Data Center (NGDC, https://ngdc.cncb.ac.cn/) with accession number CRA02411470 under BioProject accession numbers PRJCA037665. RNA-seq data for root, stem, and leaf tissues have been deposited at the NCBI Sequence Read Archive (SRA) under accessions SRR3278399471, SRR3276777072, and SRR3278399373, respectively. The final genome assembly has been deposited in NCBI GenBank with accession number JBMIOF00000000074 and JBMIOG00000000075 for haplotype A and haplotype B, respectively. The annotation files of the genome are available at the Figshare database76.

Code availability

We did not use any custom code in this study. The versions and parameters of the bioinformatic tools used in this study were described in the Methods section. If a parameter was used with other than its default value, this was stated above as well.

References

  1. Fu, L. K., Jin, J. M. China plant red data book: rare and endangered plants (Science Press, 1992).

  2. Li, J. et al. Rediscovery of Berchemiella wilsonii (Schneid.) Nakai (Rhamnaceae) an endangered species from Hubei, China. Acta Phytotaxon. Sin. 42, 86–88 (2004).

    Google Scholar 

  3. Pang, J. H. et al. Population structure and dynamic characteristics of endangered plant species (Berchemiella wilsonii) and its variety Berchemiella wilsonii var. pubipetiolata. Guihaia 45, 108–120 (2025).

    Google Scholar 

  4. Qian, H. A study on the cenus Berchemiella Nakai (Rhamnaceae) endemic to eastern Asia. Bulletin of Botanical Research 8, 119–128 (1988).

    Google Scholar 

  5. Kang, M., Jiang, M. X. & Huang, H. W. Genetic diversity in fragmented populations of Berchemiella wilsonii var. pubipetiolata (Rhamnaceae). Ann. Bot. 95, 1145–1151 (2005).

    Google Scholar 

  6. Kang, M., Xu, F. H., Lowe, A. & Huang, H. W. Protecting evolutionary significant units for the remnant populations of Berchemiella wilsonii var. pubipetiolata (Rhamnaceae). Conserv. Genet. 8, 465–473 (2007).

    Google Scholar 

  7. Iwatsuki, K., Boufford, D.E., Ohba, H. Flora of Japan, Vol. IIc (Kodansha Ltd., 1999).

  8. Kang, M., Wang, J. & Huang, H. W. Demographic bottlenecks and low gene flow in remnant populations of the critically endangered Berchemiella wilsonii var. pubipetiolata (Rhamnaceae) inferred from microsatellite markers. Conserv. Genet. 9, 191–199 (2008).

    Google Scholar 

  9. Chang, C. S., Kim, H. & Park, T. Y. Patterns of allozyme diversity in several selected rare species in Korea and implications for conservation. Biodivers Conserv. 12, 529–544 (2003).

    Google Scholar 

  10. Dang, H. S., Zhang, Y. J., Jiang, M. X., Huang, H. D. & Jin, X. A Preliminary Study on Dormancy and Germination Physiology of Endangered Species Berchemiella wilsonii (Schneid.) Nakai var.pubipetiolata H.Qian Seeds. Plant Science Journal 23, 327–331 (2005).

    Google Scholar 

  11. Yang, Y. Z. et al. Genomic effects of population collapse in a critically endangered ironwood tree Ostrya rehderiana. Nat. Commun. 9, 5449 (2018).

    Google Scholar 

  12. Zhao, Y. P. et al. Resequencing 545 ginkgo genomes across the world reveals the evolutionary history of the living fossil. Nat. Commun. 10, 4201 (2019).

    Google Scholar 

  13. Li, H. D., Ding, J. L. & He, X. Berchemiella wilsonii: a new plant record from Zhejiang discovered in Shengzhou. Journal of Zhejiang A&F University. 29, 639–640 (2012).

    Google Scholar 

  14. Karbstein, K. et al. Species delimitation 4.0: integrative taxonomy meets artificial intelligence. Trends Ecol. Evol. 39, 771–784 (2024).

    Google Scholar 

  15. Hu, Y. B. et al. Genomic evidence for two phylogenetic species and long-term population bottlenecks in red pandas. Sci. Adv. 6, eaax5751 (2020).

    Google Scholar 

  16. Song, B. et al. Plant genome resequencing and population genomics: Current status and future prospects. Mol. Plant. 16, 1252–1268 (2023).

    Google Scholar 

  17. Li, H. & Durbin, R. Genome assembly in the telomere-to-telomere era. Nat. Rev. Genet. 25, 658–670 (2024).

    Google Scholar 

  18. Li, K. et al. Haplotype-resolved T2T reference genomes for wild and domesticated accessions shed new insights into the domestication of jujube. Hortic. Res. 11, uhae071–uhae071 (2024).

    Google Scholar 

  19. Vilanova, S. et al. SILEX: a fast and inexpensive high-quality DNA extraction method suitable for multiple sequencing platforms and recalcitrant plant species. Plant Methods 16, 110 (2020).

    Google Scholar 

  20. Chen, S. F., Zhou, Y. Q., Chen, Y. R. & Gu, J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    Google Scholar 

  21. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).

    Google Scholar 

  22. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).

    Google Scholar 

  23. Cheng, H. Y., Concepcion, G. T., Feng, X. W., Zhang, H. W. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Google Scholar 

  24. Zhang, X. T., Zhang, S. C., Zhao, Q., Ming, R. & Tang, H. B. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 5, 833–845 (2019).

    Google Scholar 

  25. Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).

    Google Scholar 

  26. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Google Scholar 

  27. Jain, C. et al. Weighted minimizer sampling improves long read mapping. Bioinformatics 36, i111–i118 (2020).

    Google Scholar 

  28. Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, 1–9 (2004).

    Google Scholar 

  29. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Google Scholar 

  30. Fu, L. M., Niu, B. F., Zhu, Z. W., WU, S. T. & LI, W. Z. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

    Google Scholar 

  31. Flynn, J. A.-O. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).

    Google Scholar 

  32. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).

    Google Scholar 

  33. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18–18 (2008).

    Google Scholar 

  34. Ou, S. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176, 1410–1422 (2018).

    Google Scholar 

  35. Abrusán, G., Grundmann, N., DeMester, L. & Makalowski, W. TEclass-a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).

    Google Scholar 

  36. Tarailo-Graovac, M. & Chen, N. S. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics 4, Unit 4.10 (2009).

    Google Scholar 

  37. Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585 (2017).

    Google Scholar 

  38. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Google Scholar 

  39. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).

    Google Scholar 

  40. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

    Google Scholar 

  41. Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).

    Google Scholar 

  42. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).

    Google Scholar 

  43. Delcher, A. L. et al. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23, 673–679 (2007).

    Google Scholar 

  44. Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011).

    Google Scholar 

  45. Dainat J. Another Gtf/Gff Analysis Toolkit (AGAT): Resolve interoperability issues and accomplish more with your annotations. https://zenodo.org/records/13799920 (2024).

  46. Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).

    Google Scholar 

  47. Nevers, Y. et al. Quality assessment of gene repertoire annotations with OMArk. Nat. Biotechnol. 43, 124–133 (2025).

    Google Scholar 

  48. Sommer, M. J., Zimin, A. V. & Salzberg, S. L. PSAURON: a tool for assessing protein annotation across a broad range of species. NAR Genomics Bioinforma. 7, lqae189 (2025).

    Google Scholar 

  49. The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).

    Google Scholar 

  50. Deng, Y. Y. et al. Integrated nr database in protein annotation system and its localization. Computer Engineering 32, 71–72 (2006).

    Google Scholar 

  51. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41–41 (2003).

    Google Scholar 

  52. Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49, D344–D354 (2020).

    Google Scholar 

  53. Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2019).

    Google Scholar 

  54. Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. EggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).

    Google Scholar 

  55. Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).

    Google Scholar 

  56. Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).

    Google Scholar 

  57. Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2020).

    Google Scholar 

  58. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

    Google Scholar 

  59. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

    Google Scholar 

  60. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. TrimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).

    Google Scholar 

  61. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    Google Scholar 

  62. Yang, Z. H. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    Google Scholar 

  63. Han, M. V., Thomas, G. W. C., Lugo-Martinez, J. & Hahn, M. W. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 30, 1987–1997 (2013).

    Google Scholar 

  64. Wang, Y. P. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49, https://doi.org/10.1093/nar/gkr1293 (2012).

    Google Scholar 

  65. Frith, M. C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010).

    Google Scholar 

  66. Tang, H. B. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).

    Google Scholar 

  67. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35779783 (2025).

  68. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35779785 (2025).

  69. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35779784 (2025).

  70. NGDC Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA024114 (2025).

  71. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR32783994 (2025).

  72. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR32767770 (2025).

  73. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR32783993 (2025).

  74. Pang, J. H. GenBank, https://identifiers.org/ncbi/insdc:JBMIOF000000000 (2025).

  75. Pang, J. H. GenBank, https://identifiers.org/ncbi/insdc:JBMIOG000000000 (2025).

  76. Pang, J. H. T. 2T Genome assembly and annotation of Berchemiella wilsonii. figshare https://doi.org/10.6084/m9.figshare.28869281 (2025).

  77. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).

    Google Scholar 

  78. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013).

Download references

Acknowledgements

This work was financially supported by National Natural Science Foundation of China (U2571202; 32371653) and National Key R&D Program of China (2024YFF1307400).

Author information

Authors and Affiliations

Authors

Contributions

X.W. and M.J. conceived and supervised the study. J.P., Z.X., Y.W., Y.C., S.S., H.W. and X.W. performed the experiments and analysed the data. J.P. and X.W. wrote the draft of the manuscript. All authors reviewed and contributed to the final version of the manuscript.

Corresponding author

Correspondence to
Xinzeng Wei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Pang, J., Xiao, Z., Wang, Y. et al. Telomere-to-Telomere genome assembly of an endangered tree Berchemiella wilsonii (Rhamnaceae).
Sci Data (2025). https://doi.org/10.1038/s41597-025-06433-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41597-025-06433-3


Source: Ecology - nature.com

Advancing long-term phytoplankton biodiversity assessment in the North Sea using an imaging approach

Chemoautotrophic carbon fixation in thermokarst lakes on the Tibetan Plateau

Back to Top