Telomere-to-Telomere genome assembly of an endangered tree Berchemiella wilsonii (Rhamnaceae)

Abstract

Berchemiella wilsonii (Schneid.) Nakai is one of the National Key Protected Wild Plants in China, and one of the 120 Plant Species with Extremely Small Populations in China. By combining PacBio HiFi sequencing, DNBSEQ sequencing and Hi-C sequencing, we have assembled a chromosome-scale, haplotype-resolved genome for B. wilsonii. The final assembled haplotype A and haplotype B genomes were 216.84 Mb and 217.69 Mb, anchored to 2n = 24 chromosomes, all chromosomal ends contain telomeric characteristic motifs (TTTAGGG), only haplotype A has one gap, both close to a T2T genome assembly, with contig N50 lengths of 18.24 Mb and 18.03 Mb, respectively. Further, BUSCO analysis showed an extremely high assembly completeness (98.9% complete BUSCO genes). The genome contains a total of 22,828 coding genes, of which 21,992 (96.34%) were functionally annotated. This is the first report of the genome of B. wilsonii and the high-quality genome will provide new insights into evolutionary history and taxonomic classification challenges of endangered plant species.

A nearly telomere-to-telomere diploid genome assembly of Firmiana kwangsiensis, a threatened species in China

Article
Open access
18 December 2024

Near telomere-to-telomere (T2T) level genome assembly of the critically endangered plant Magnolia zenii (Magnoliaceae)

Article
Open access
08 December 2025

Chromosome-level and haplotype-resolved genome assembly of Bougainvillea glabra

Article
Open access
18 January 2025

Data availability

The raw date of PacBio HiFi, DNBSEQ and Hi-C sequencing data generated in this study have been deposited at the NCBI Sequence Read Archive (SRA) under the BioProject number of PRJNA1229120. The accession numbers of PacBio HiFi, DNBSEQ and Hi-C sequencing data are publicly accessible as SRR35779783⁶⁷, SRR35779785⁶⁸, and SRR35779784⁶⁹, respectively. Furthermore, the raw data were also deposited at the National Genomics Data Center (NGDC, https://ngdc.cncb.ac.cn/) with accession number CRA024114⁷⁰ under BioProject accession numbers PRJCA037665. RNA-seq data for root, stem, and leaf tissues have been deposited at the NCBI Sequence Read Archive (SRA) under accessions SRR32783994⁷¹, SRR32767770⁷², and SRR32783993⁷³, respectively. The final genome assembly has been deposited in NCBI GenBank with accession number JBMIOF000000000⁷⁴ and JBMIOG000000000⁷⁵ for haplotype A and haplotype B, respectively. The annotation files of the genome are available at the Figshare database⁷⁶.

Code availability

We did not use any custom code in this study. The versions and parameters of the bioinformatic tools used in this study were described in the Methods section. If a parameter was used with other than its default value, this was stated above as well.

References

Fu, L. K., Jin, J. M. China plant red data book: rare and endangered plants (Science Press, 1992).
Li, J. et al. Rediscovery of Berchemiella wilsonii (Schneid.) Nakai (Rhamnaceae) an endangered species from Hubei, China. Acta Phytotaxon. Sin. 42, 86–88 (2004).
Google Scholar
Pang, J. H. et al. Population structure and dynamic characteristics of endangered plant species (Berchemiella wilsonii) and its variety Berchemiella wilsonii var. pubipetiolata. Guihaia 45, 108–120 (2025).
Google Scholar
Qian, H. A study on the cenus Berchemiella Nakai (Rhamnaceae) endemic to eastern Asia. Bulletin of Botanical Research 8, 119–128 (1988).
Google Scholar
Kang, M., Jiang, M. X. & Huang, H. W. Genetic diversity in fragmented populations of Berchemiella wilsonii var. pubipetiolata (Rhamnaceae). Ann. Bot. 95, 1145–1151 (2005).
Google Scholar
Kang, M., Xu, F. H., Lowe, A. & Huang, H. W. Protecting evolutionary significant units for the remnant populations of Berchemiella wilsonii var. pubipetiolata (Rhamnaceae). Conserv. Genet. 8, 465–473 (2007).
Google Scholar
Iwatsuki, K., Boufford, D.E., Ohba, H. Flora of Japan, Vol. IIc (Kodansha Ltd., 1999).
Kang, M., Wang, J. & Huang, H. W. Demographic bottlenecks and low gene flow in remnant populations of the critically endangered Berchemiella wilsonii var. pubipetiolata (Rhamnaceae) inferred from microsatellite markers. Conserv. Genet. 9, 191–199 (2008).
Google Scholar
Chang, C. S., Kim, H. & Park, T. Y. Patterns of allozyme diversity in several selected rare species in Korea and implications for conservation. Biodivers Conserv. 12, 529–544 (2003).
Google Scholar
Dang, H. S., Zhang, Y. J., Jiang, M. X., Huang, H. D. & Jin, X. A Preliminary Study on Dormancy and Germination Physiology of Endangered Species Berchemiella wilsonii (Schneid.) Nakai var.pubipetiolata H.Qian Seeds. Plant Science Journal 23, 327–331 (2005).
Google Scholar
Yang, Y. Z. et al. Genomic effects of population collapse in a critically endangered ironwood tree Ostrya rehderiana. Nat. Commun. 9, 5449 (2018).
Google Scholar
Zhao, Y. P. et al. Resequencing 545 ginkgo genomes across the world reveals the evolutionary history of the living fossil. Nat. Commun. 10, 4201 (2019).
Google Scholar
Li, H. D., Ding, J. L. & He, X. Berchemiella wilsonii: a new plant record from Zhejiang discovered in Shengzhou. Journal of Zhejiang A&F University. 29, 639–640 (2012).
Google Scholar
Karbstein, K. et al. Species delimitation 4.0: integrative taxonomy meets artificial intelligence. Trends Ecol. Evol. 39, 771–784 (2024).
Google Scholar
Hu, Y. B. et al. Genomic evidence for two phylogenetic species and long-term population bottlenecks in red pandas. Sci. Adv. 6, eaax5751 (2020).
Google Scholar
Song, B. et al. Plant genome resequencing and population genomics: Current status and future prospects. Mol. Plant. 16, 1252–1268 (2023).
Google Scholar
Li, H. & Durbin, R. Genome assembly in the telomere-to-telomere era. Nat. Rev. Genet. 25, 658–670 (2024).
Google Scholar
Li, K. et al. Haplotype-resolved T2T reference genomes for wild and domesticated accessions shed new insights into the domestication of jujube. Hortic. Res. 11, uhae071–uhae071 (2024).
Google Scholar
Vilanova, S. et al. SILEX: a fast and inexpensive high-quality DNA extraction method suitable for multiple sequencing platforms and recalcitrant plant species. Plant Methods 16, 110 (2020).
Google Scholar
Chen, S. F., Zhou, Y. Q., Chen, Y. R. & Gu, J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Google Scholar
Cheng, H. Y., Concepcion, G. T., Feng, X. W., Zhang, H. W. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Google Scholar
Zhang, X. T., Zhang, S. C., Zhao, Q., Ming, R. & Tang, H. B. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 5, 833–845 (2019).
Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Google Scholar
Jain, C. et al. Weighted minimizer sampling improves long read mapping. Bioinformatics 36, i111–i118 (2020).
Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, 1–9 (2004).
Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Google Scholar
Fu, L. M., Niu, B. F., Zhu, Z. W., WU, S. T. & LI, W. Z. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Google Scholar
Flynn, J. A.-O. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).
Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18–18 (2008).
Google Scholar
Ou, S. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
Google Scholar
Abrusán, G., Grundmann, N., DeMester, L. & Makalowski, W. TEclass-a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).
Google Scholar
Tarailo-Graovac, M. & Chen, N. S. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics 4, Unit 4.10 (2009).
Google Scholar
Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics 33, 2583–2585 (2017).
Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Google Scholar
Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).
Google Scholar
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
Google Scholar
Delcher, A. L. et al. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23, 673–679 (2007).
Google Scholar
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011).
Google Scholar
Dainat J. Another Gtf/Gff Analysis Toolkit (AGAT): Resolve interoperability issues and accomplish more with your annotations. https://zenodo.org/records/13799920 (2024).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Google Scholar
Nevers, Y. et al. Quality assessment of gene repertoire annotations with OMArk. Nat. Biotechnol. 43, 124–133 (2025).
Google Scholar
Sommer, M. J., Zimin, A. V. & Salzberg, S. L. PSAURON: a tool for assessing protein annotation across a broad range of species. NAR Genomics Bioinforma. 7, lqae189 (2025).
Google Scholar
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
Google Scholar
Deng, Y. Y. et al. Integrated nr database in protein annotation system and its localization. Computer Engineering 32, 71–72 (2006).
Google Scholar
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41–41 (2003).
Google Scholar
Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49, D344–D354 (2020).
Google Scholar
Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2019).
Google Scholar
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. EggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Google Scholar
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).
Google Scholar
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
Google Scholar
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2020).
Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. TrimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Google Scholar
Yang, Z. H. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Google Scholar
Han, M. V., Thomas, G. W. C., Lugo-Martinez, J. & Hahn, M. W. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 30, 1987–1997 (2013).
Google Scholar
Wang, Y. P. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49, https://doi.org/10.1093/nar/gkr1293 (2012).
Google Scholar
Frith, M. C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010).
Google Scholar
Tang, H. B. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35779783 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35779785 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR35779784 (2025).
NGDC Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA024114 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR32783994 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR32767770 (2025).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR32783993 (2025).
Pang, J. H. GenBank, https://identifiers.org/ncbi/insdc:JBMIOF000000000 (2025).
Pang, J. H. GenBank, https://identifiers.org/ncbi/insdc:JBMIOG000000000 (2025).
Pang, J. H. T. 2T Genome assembly and annotation of Berchemiella wilsonii. figshare https://doi.org/10.6084/m9.figshare.28869281 (2025).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013).

Download references

Acknowledgements

This work was financially supported by National Natural Science Foundation of China (U2571202; 32371653) and National Key R&D Program of China (2024YFF1307400).

Author information

These authors contributed equally: Jianghao Pang, Zhiqiang Xiao.

Authors and Affiliations

State Key Laboratory of Plant Diversity and Specialty Crops, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China
Jianghao Pang, Shuaishuai Song, Hao Wu, Xinzeng Wei & Mingxi Jiang
University of Chinese Academy of Sciences, Beijing, 100049, China
Jianghao Pang, Shuaishuai Song, Hao Wu, Xinzeng Wei & Mingxi Jiang
China Three Gorges Corporation, Wuhan, 430010, China
Zhiqiang Xiao
Hubei Wufeng Houhe National Nature Reserve Administration, Wufeng, 443400, Hubei, China
Yeqing Wang & Yufen Cheng
Jixi Management Station of Anhui Qingliangfeng National Nature Reserve, Xuancheng, 245300, Anhui, China
Yunlong Tang

Authors

Jianghao Pang
View author publications
Search author on:PubMed Google Scholar
Zhiqiang Xiao
View author publications
Search author on:PubMed Google Scholar
Yeqing Wang
View author publications
Search author on:PubMed Google Scholar
Yufen Cheng
View author publications
Search author on:PubMed Google Scholar
Yunlong Tang
View author publications
Search author on:PubMed Google Scholar
Shuaishuai Song
View author publications
Search author on:PubMed Google Scholar
Hao Wu
View author publications
Search author on:PubMed Google Scholar
Xinzeng Wei
View author publications
Search author on:PubMed Google Scholar
Mingxi Jiang
View author publications
Search author on:PubMed Google Scholar

Contributions

X.W. and M.J. conceived and supervised the study. J.P., Z.X., Y.W., Y.C., S.S., H.W. and X.W. performed the experiments and analysed the data. J.P. and X.W. wrote the draft of the manuscript. All authors reviewed and contributed to the final version of the manuscript.

Corresponding author

Correspondence to
Xinzeng Wei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Pang, J., Xiao, Z., Wang, Y. et al. Telomere-to-Telomere genome assembly of an endangered tree Berchemiella wilsonii (Rhamnaceae).
Sci Data (2025). https://doi.org/10.1038/s41597-025-06433-3

Download citation

Received: 30 April 2025
Accepted: 02 December 2025
Published: 16 December 2025
DOI: https://doi.org/10.1038/s41597-025-06433-3

Source: Ecology - nature.com

Telomere-to-Telomere genome assembly of an endangered tree Berchemiella wilsonii (Rhamnaceae)

Abstract

Similar content being viewed by others

A nearly telomere-to-telomere diploid genome assembly of Firmiana kwangsiensis, a threatened species in China

Near telomere-to-telomere (T2T) level genome assembly of the critically endangered plant Magnolia zenii (Magnoliaceae)

Chromosome-level and haplotype-resolved genome assembly of Bougainvillea glabra

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Tables

Rights and permissions

About this article

Cite this article

Advancing long-term phytoplankton biodiversity assessment in the North Sea using an imaging approach

Chemoautotrophic carbon fixation in thermokarst lakes on the Tibetan Plateau

ITALIAN LANGUAGE

ENGLISH LANGUAGE

Abstract

Similar content being viewed by others

A nearly telomere-to-telomere diploid genome assembly of Firmiana kwangsiensis, a threatened species in China

Near telomere-to-telomere (T2T) level genome assembly of the critically endangered plant Magnolia zenii (Magnoliaceae)

Chromosome-level and haplotype-resolved genome assembly of Bougainvillea glabra

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Tables

Rights and permissions

About this article

Cite this article

Share this article

ITALIAN LANGUAGE

ENGLISH LANGUAGE