Abstract
Plasmids are extrachromosomal mobile genetic elements whose copy numbers (PCNs) critically influence microbial evolution, antibiotic resistance and pathogenicity. Despite their importance and immense diversity, the ecological, evolutionary and molecular factors determining PCN remain poorly understood. Here, we present a theoretical model to explain the empirical power-law relationship between plasmid size and copy number, one of the fundamental quantitative principles governing PCN control. However, this relationship alone has limited predictive power. To improve PCN prediction, we introduce a data-driven approach incorporating diverse features. Trained and tested on 11,051 plasmids, our machine learning model achieves significantly enhanced accuracy, with plasmid-encoded protein domains emerging as key predictors. Applying this framework, we conduct a large-scale analysis of PCN distributions across hundreds of thousands of metagenomic plasmids (IMG/PR database) and tens of thousands of clinical isolates, revealing putative niche specific taxonomic PCN hotspots and hypothesis-generating ecological trends. These results provide valuable insights into plasmid ecology, antibiotic resistance genes (ARGs) surveillance and shed lights on the gut plasmidome, a “dark matter” in human microbiome.
Similar content being viewed by others
Universal rules govern plasmid copy number
Plasmid copy number as a modulator in bacterial pathogenesis and antibiotic resistance
Diverse plasmid systems and their ecology across human gut metagenomes revealed by PlasX and MobMess
Data availability
All the data associated with this work are available at the GitHub repository (https://github.com/Iqra123isynbio/Plasmid_copy_number_Prediction) and have been archived on Zenodo https://doi.org/10.5281/zenodo.1934324965. Source data are provided with this paper.
Code availability
All codes are available at the Github repository (https://github.com/Iqra123isynbio/Plasmid_copy_number_Prediction) and have been archived on Zenodo https://doi.org/10.5281/zenodo.1934324965.
References
Rodríguez-Beltrán, J., DelaFuente, J., León-Sampedro, R., MacLean, R. C. & San Millán, Á Beyond horizontal gene transfer: the role of plasmids in bacterial evolution. Nat. Rev. Microbiol. 19, 347–359 (2021).
Ramiro-Martínez, P., de Quinto, I., Lanza, V. F., Gama, J. A. & Rodríguez-Beltrán, J. Universal rules govern plasmid copy number. Nat. Commun. 16, 6022 (2025).
Maddamsetti, R. et al. Scaling laws of bacterial and archaeal plasmids. Nat. Commun. 16, 6023 (2025).
San Millan, A., Escudero, J. A., Gifford, D. R., Mazel, D. & MacLean, R. C. Multicopy plasmids potentiate the evolution of antibiotic resistance in bacteria. Nat. Ecol. Evol. 1, 0010 (2016).
Yao, Y. et al. Intra-and interpopulation transposition of mobile genetic elements driven by antibiotic selection. Nat. Ecol. Evol. 6, 555–564 (2022).
Hernandez-Beltran, J. C. R. et al. Plasmid-mediated phenotypic noise leads to transient antibiotic resistance in bacteria. Nat. Commun. 15, 2610 (2024).
Wang, H. et al. Increased plasmid copy number is essential for Yersinia T3SS function and virulence. Science 353, 492–495 (2016).
Sidhu, R. K. et al. Attenuation of virulence in Yersinia pestis across three plague pandemics. Science 388, eadt3880 (2025).
Wang, H. & Joffré, E. Plasmid copy number as a modulator in bacterial pathogenesis and antibiotic resistance. npj Antimicrob. Resist. 3, 72 (2025).
Joshi, S. H.-N., Yong, C. & Gyorgy, A. Inducible plasmid copy number control for synthetic biology in commonly used E. coli strains. Nat. Commun. 13, 6691 (2022).
Kumar, S., Lezia, A. & Hasty, J. Engineering plasmid copy number heterogeneity for dynamic microbial adaptation. Nat. Microbiol. 9, 2173–2184 (2024).
Rouches, M. V., Xu, Y., Cortes, L. B. G. & Lambert, G. A plasmid system with tunable copy number. Nat. Commun. 13, 3908 (2022).
Son, H. -I. et al. Population-level amplification of gene regulation by programmable gene transfer. Nat. Chem. Biol. 21, 939–948 (2025).
Camargo, A. P. et al. IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata. Nucleic Acids Res. 52, D164–D173 (2024).
Lee, C., Kim, J., Shin, S. G. & Hwang, S. Absolute and relative QPCR quantification of plasmid copy number in Escherichia coli. J. Biotechnol. 123, 273–280 (2006).
Browne, H. P. et al. Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543–546 (2016).
Zhang, Z. et al. Assessment of global health risk of antibiotic resistance genes. Nat. Commun. 13, 1553 (2022).
Lucks, J. B., Qi, L., Whitaker, W. R. & Arkin, A. P. Toward scalable parts families for predictable design of biological circuits. Curr. Opin. Microbiol. 11, 567–573 (2008).
Leinonen, R., Sugawara, H., Shumway, M. & Collaboration, I. N. S. D. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2010).
Soppa, J. Polyploidy and community structure. Nat. Microbiol. 2, 1–2 (2017).
Mendell, J. E., Clements, K. D., Choat, J. H. & Angert, E. R. Extreme polyploidy in a large bacterium. Proc. Natl. Acad. Sci. 105, 6730–6734 (2008).
Takacs, C. N. et al. Polyploidy, regular patterning of genome copies, and unusual control of DNA partitioning in the Lyme disease spirochete. Nat. Commun. 13, 7173 (2022).
Hamrick, G. S. et al. Programming dynamic division of labor using horizontal gene transfer. ACS Synth. Biol. 13, 1142–1151 (2024).
Xue, W., Hong, J. & Wang, T. The evolutionary landscape of prokaryotic chromosome/plasmid balance. Commun. Biol. 7, 1434 (2024).
Doolittle, W. F. & Sapienza, C. Selfish genes, the phenotype paradigm and genome evolution. Nature 284, 601–603 (1980).
Lopatkin, A. J. et al. Persistence and reversal of plasmid-mediated antibiotic resistance. Nat. Commun. 8, 1689 (2017).
Wang, T. & You, L. The persistence potential of transferable plasmids. Nat. Commun. 11, 5589 (2020).
Hall, J. P. et al. Plasmid fitness costs are caused by specific genetic conflicts enabling resolution by compensatory mutation. PLos Biol. 19, e3001225 (2021).
Thomas, C. M. et al. Annotation of plasmid genes. Plasmid 91, 61–67 (2017).
Yu, M. K., Fogarty, E. C. & Eren, A. M. Diverse plasmid systems and their ecology across human gut metagenomes revealed by PlasX and MobMess. Nat. Microbiol. 9, 830–847 (2024).
Paysan-Lafosse, T. et al. The Pfam protein families database: embracing AI/ML. Nucleic Acids Res. 53, D523–D534 (2025).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 11, 119 (2010).
Finn, R. D. et al. Pfam: clans, web tools and services. Nucleic Acids Res. 34, D247–D251 (2006).
Motallebi-Veshareh, M., Rouch, D. & Thomas, C. A family of ATPases involved in active partitioning of diverse bacterial plasmids. Mol. Microbiol. 4, 1455–1463 (1990).
Mitchell, A. et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43, D213–D221 (2015).
Campbell, E. A. et al. Structure of the bacterial RNA polymerase promoter specificity σ subunit. Mol. Cell 9, 527–539 (2002).
Seabold, R. R. & Schleif, R. F. Apo-AraC actively seeks to loop. J. Mol. Biol. 278, 529–538 (1998).
Kleiger, G. & Eisenberg, D. GXXXG and GXXXA motifs stabilize FAD and NAD (P)-binding Rossmann folds through Cα–H⋯ O hydrogen bonds and van der Waals interactions. J. Mol. Biol. 323, 69–76 (2002).
De Meo, P., Ferrara, E., Fiumara, G. & Provetti, A. Generalized louvain method for community detection in large networks. 11th international conference on intelligent systems design and applications, 88-93 (2011).
Bethke, J. H. et al. Environmental and genetic determinants of plasmid mobility in pathogenic Escherichia coli. Sci. Adv. 6, eaax3173 (2020).
Cooper, A. L., Wong, A., Tamber, S., Blais, B. W. & Carrillo, C. D. Analysis of antimicrobial resistance in bacterial pathogens recovered from food and human sources: insights from 639,087 bacterial whole-genome sequences in the NCBI Pathogen Detection database. Microorganisms 12, 709 (2024).
Alcock, B. P. et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Aacids Res. 51, D690–D699 (2023).
Avire, N. J., Whiley, H. & Ross, K. A review of Streptococcus pyogenes: public health risk factors, prevention and control. Pathogens 10, 248 (2021).
Higgins, D. A. et al. The major Vibrio cholerae autoinducer and its role in virulence factor production. Nature 450, 883–886 (2007).
Hazen, T. C., Fliermans, C. B., Hirsch, R. P. & Esch, G. W. Prevalence and distribution of Aeromonas hydrophila in the United States. Appl. Environ. Microbiol. 36, 731–738 (1978).
Salyers, A. A., Gupta, A. & Wang, Y. Human intestinal bacteria as reservoirs for antibiotic resistance genes. Trends Microbiol. 12, 412–416 (2004).
Jurėnas, D., Fraikin, N., Goormaghtigh, F. & Van Melderen, L. Biology and evolution of bacterial toxin–antitoxin systems. Nat. Rev. Microbiol. 20, 335–350 (2022).
Luo, X. et al. Characterization of DinJ-YafQ toxin–antitoxin module in Tetragenococcus halophilus: activity, interplay, and evolution. Appl. Microbiol. Biotechnol. 105, 3659–3672 (2021).
San Millan, A. et al. Small-plasmid-mediated antibiotic resistance is enhanced by increases in plasmid copy number and bacterial fitness. Antimicrob. Agents Chemother. 59, 3335–3341 (2015).
van Mastrigt, O., Lommers, M. M., de Vries, Y. C., Abee, T. & Smid, E. J. Dynamics in copy numbers of five plasmids of a dairy Lactococcus lactis strain under dairy-related conditions, including near-zero growth rates. Appl. Environ. Microbiol. 84, e00314–00318 (2018).
Uhlin, B. E., Molin, S., Gustafsson, P. & Nordström, K. Plasmids with temperature-dependent copy number for amplification of cloned genes and their products. Gene 6, 91–106 (1979).
Akasaka, N. et al. Change in the plasmid copy number in acetic acid bacteria in response to growth phase and acetic acid concentration. J. Biosci. Bioeng. 119, 661–668 (2015).
Saad, A. et al. Plasmid copy number variation impacts pathogenicity and quantification of Curtobacterium flaccumfaciens pv. flaccumfaciens infecting the mung bean. Plant Pathol. 74, 2670–2681 (2025).
Wedel, E. et al. Insertion sequences determine plasmid adaptation to new bacterial hosts. MBio 14, e03158–03122 (2023).
Peng, K. et al. Long-read metagenomic sequencing reveals that high-copy small plasmids shape the highly prevalent antibiotic resistance genes in the animal fecal microbiome. Sci. Total Environ. 893, 164585 (2023).
Potter, S. C. et al. HMMER web server: 2018 update. Nucleic acids Res. 46, W200–W204 (2018).
Ahlmann-Eltze, C. & Huber, W. Comparison of transformations for single-cell RNA-seq data. Nat. Methods 20, 665–672 (2023).
Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. (2012).
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794 (2016).
Rajewska, M., Wegrzyn, K. & Konieczny, I. AT-rich region and repeated sequences – the essential elements of replication origins of bacterial replicons. FEMS Microbiol. Rev. 36, 408–434 (2012).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Robertson, J. ames & Nash John H, E. MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microb. Genom. 4, e000206 (2018).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Shahzadi, I. et al. Integrating theory and machine learning to reveal determinants of plasmid copy number. Github https://doi.org/10.5281/zenodo.19343249 (2026).
Acknowledgments
This study was supported by the National Key R&D Program of China (2024YFA0920200 to TW), the National Natural Science Foundation of China (32470701 and 12401660 to TW), and the Shenzhen Institute of Synthetic Biology Scientific Research Program (HSE499011086 to TW). We are grateful to the Shenzhen Infrastructure for Synthetic Biology for providing instrument support and technical assistance.
Author information
Authors and Affiliations
Contributions
I.S. and T.W. conceptualized the study. I.S. performed data analysis, developed the machine learning framework, prepared figures, and drafted the manuscript. H.U.U. contributed to data visualization. W.X., R.M., and L.Y. contributed to data processing and manuscript revision. T.W. supervised the study, acquired funding, provided conceptual guidance, and revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information (download PDF )
Peer Review file (download PDF )
Reporting Summary (download PDF )
Source data
Source Data (download XLSX )
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and permissions
About this article
Cite this article
Shahzadi, I., Xue, W., Ubaid Ullah, H. et al. Integrating theory and machine learning to reveal determinants of plasmid copy number.
Nat Commun (2026). https://doi.org/10.1038/s41467-026-72303-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-72303-0
Source: Ecology - nature.com
