in

Integrating theory and machine learning to reveal determinants of plasmid copy number


Abstract

Plasmids are extrachromosomal mobile genetic elements whose copy numbers (PCNs) critically influence microbial evolution, antibiotic resistance and pathogenicity. Despite their importance and immense diversity, the ecological, evolutionary and molecular factors determining PCN remain poorly understood. Here, we present a theoretical model to explain the empirical power-law relationship between plasmid size and copy number, one of the fundamental quantitative principles governing PCN control. However, this relationship alone has limited predictive power. To improve PCN prediction, we introduce a data-driven approach incorporating diverse features. Trained and tested on 11,051 plasmids, our machine learning model achieves significantly enhanced accuracy, with plasmid-encoded protein domains emerging as key predictors. Applying this framework, we conduct a large-scale analysis of PCN distributions across hundreds of thousands of metagenomic plasmids (IMG/PR database) and tens of thousands of clinical isolates, revealing putative niche specific taxonomic PCN hotspots and hypothesis-generating ecological trends. These results provide valuable insights into plasmid ecology, antibiotic resistance genes (ARGs) surveillance and shed lights on the gut plasmidome, a “dark matter” in human microbiome.

Similar content being viewed by others

Universal rules govern plasmid copy number

Plasmid copy number as a modulator in bacterial pathogenesis and antibiotic resistance

Diverse plasmid systems and their ecology across human gut metagenomes revealed by PlasX and MobMess

Data availability

All the data associated with this work are available at the GitHub repository (https://github.com/Iqra123isynbio/Plasmid_copy_number_Prediction) and have been archived on Zenodo https://doi.org/10.5281/zenodo.1934324965. Source data are provided with this paper.

Code availability

All codes are available at the Github repository (https://github.com/Iqra123isynbio/Plasmid_copy_number_Prediction) and have been archived on Zenodo https://doi.org/10.5281/zenodo.1934324965.

References

  1. Rodríguez-Beltrán, J., DelaFuente, J., León-Sampedro, R., MacLean, R. C. & San Millán, Á Beyond horizontal gene transfer: the role of plasmids in bacterial evolution. Nat. Rev. Microbiol. 19, 347–359 (2021).

    Google Scholar 

  2. Ramiro-Martínez, P., de Quinto, I., Lanza, V. F., Gama, J. A. & Rodríguez-Beltrán, J. Universal rules govern plasmid copy number. Nat. Commun. 16, 6022 (2025).

    Google Scholar 

  3. Maddamsetti, R. et al. Scaling laws of bacterial and archaeal plasmids. Nat. Commun. 16, 6023 (2025).

    Google Scholar 

  4. San Millan, A., Escudero, J. A., Gifford, D. R., Mazel, D. & MacLean, R. C. Multicopy plasmids potentiate the evolution of antibiotic resistance in bacteria. Nat. Ecol. Evol. 1, 0010 (2016).

    Google Scholar 

  5. Yao, Y. et al. Intra-and interpopulation transposition of mobile genetic elements driven by antibiotic selection. Nat. Ecol. Evol. 6, 555–564 (2022).

    Google Scholar 

  6. Hernandez-Beltran, J. C. R. et al. Plasmid-mediated phenotypic noise leads to transient antibiotic resistance in bacteria. Nat. Commun. 15, 2610 (2024).

    Google Scholar 

  7. Wang, H. et al. Increased plasmid copy number is essential for Yersinia T3SS function and virulence. Science 353, 492–495 (2016).

    Google Scholar 

  8. Sidhu, R. K. et al. Attenuation of virulence in Yersinia pestis across three plague pandemics. Science 388, eadt3880 (2025).

    Google Scholar 

  9. Wang, H. & Joffré, E. Plasmid copy number as a modulator in bacterial pathogenesis and antibiotic resistance. npj Antimicrob. Resist. 3, 72 (2025).

    Google Scholar 

  10. Joshi, S. H.-N., Yong, C. & Gyorgy, A. Inducible plasmid copy number control for synthetic biology in commonly used E. coli strains. Nat. Commun. 13, 6691 (2022).

    Google Scholar 

  11. Kumar, S., Lezia, A. & Hasty, J. Engineering plasmid copy number heterogeneity for dynamic microbial adaptation. Nat. Microbiol. 9, 2173–2184 (2024).

    Google Scholar 

  12. Rouches, M. V., Xu, Y., Cortes, L. B. G. & Lambert, G. A plasmid system with tunable copy number. Nat. Commun. 13, 3908 (2022).

    Google Scholar 

  13. Son, H. -I. et al. Population-level amplification of gene regulation by programmable gene transfer. Nat. Chem. Biol. 21, 939–948 (2025).

    Google Scholar 

  14. Camargo, A. P. et al. IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata. Nucleic Acids Res. 52, D164–D173 (2024).

    Google Scholar 

  15. Lee, C., Kim, J., Shin, S. G. & Hwang, S. Absolute and relative QPCR quantification of plasmid copy number in Escherichia coli. J. Biotechnol. 123, 273–280 (2006).

    Google Scholar 

  16. Browne, H. P. et al. Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543–546 (2016).

    Google Scholar 

  17. Zhang, Z. et al. Assessment of global health risk of antibiotic resistance genes. Nat. Commun. 13, 1553 (2022).

    Google Scholar 

  18. Lucks, J. B., Qi, L., Whitaker, W. R. & Arkin, A. P. Toward scalable parts families for predictable design of biological circuits. Curr. Opin. Microbiol. 11, 567–573 (2008).

    Google Scholar 

  19. Leinonen, R., Sugawara, H., Shumway, M. & Collaboration, I. N. S. D. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2010).

    Google Scholar 

  20. Soppa, J. Polyploidy and community structure. Nat. Microbiol. 2, 1–2 (2017).

    Google Scholar 

  21. Mendell, J. E., Clements, K. D., Choat, J. H. & Angert, E. R. Extreme polyploidy in a large bacterium. Proc. Natl. Acad. Sci. 105, 6730–6734 (2008).

    Google Scholar 

  22. Takacs, C. N. et al. Polyploidy, regular patterning of genome copies, and unusual control of DNA partitioning in the Lyme disease spirochete. Nat. Commun. 13, 7173 (2022).

    Google Scholar 

  23. Hamrick, G. S. et al. Programming dynamic division of labor using horizontal gene transfer. ACS Synth. Biol. 13, 1142–1151 (2024).

    Google Scholar 

  24. Xue, W., Hong, J. & Wang, T. The evolutionary landscape of prokaryotic chromosome/plasmid balance. Commun. Biol. 7, 1434 (2024).

    Google Scholar 

  25. Doolittle, W. F. & Sapienza, C. Selfish genes, the phenotype paradigm and genome evolution. Nature 284, 601–603 (1980).

    Google Scholar 

  26. Lopatkin, A. J. et al. Persistence and reversal of plasmid-mediated antibiotic resistance. Nat. Commun. 8, 1689 (2017).

    Google Scholar 

  27. Wang, T. & You, L. The persistence potential of transferable plasmids. Nat. Commun. 11, 5589 (2020).

    Google Scholar 

  28. Hall, J. P. et al. Plasmid fitness costs are caused by specific genetic conflicts enabling resolution by compensatory mutation. PLos Biol. 19, e3001225 (2021).

    Google Scholar 

  29. Thomas, C. M. et al. Annotation of plasmid genes. Plasmid 91, 61–67 (2017).

    Google Scholar 

  30. Yu, M. K., Fogarty, E. C. & Eren, A. M. Diverse plasmid systems and their ecology across human gut metagenomes revealed by PlasX and MobMess. Nat. Microbiol. 9, 830–847 (2024).

    Google Scholar 

  31. Paysan-Lafosse, T. et al. The Pfam protein families database: embracing AI/ML. Nucleic Acids Res. 53, D523–D534 (2025).

    Google Scholar 

  32. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 11, 119 (2010).

    Google Scholar 

  33. Finn, R. D. et al. Pfam: clans, web tools and services. Nucleic Acids Res. 34, D247–D251 (2006).

    Google Scholar 

  34. Motallebi-Veshareh, M., Rouch, D. & Thomas, C. A family of ATPases involved in active partitioning of diverse bacterial plasmids. Mol. Microbiol. 4, 1455–1463 (1990).

    Google Scholar 

  35. Mitchell, A. et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43, D213–D221 (2015).

    Google Scholar 

  36. Campbell, E. A. et al. Structure of the bacterial RNA polymerase promoter specificity σ subunit. Mol. Cell 9, 527–539 (2002).

    Google Scholar 

  37. Seabold, R. R. & Schleif, R. F. Apo-AraC actively seeks to loop. J. Mol. Biol. 278, 529–538 (1998).

    Google Scholar 

  38. Kleiger, G. & Eisenberg, D. GXXXG and GXXXA motifs stabilize FAD and NAD (P)-binding Rossmann folds through Cα–H O hydrogen bonds and van der Waals interactions. J. Mol. Biol. 323, 69–76 (2002).

    Google Scholar 

  39. De Meo, P., Ferrara, E., Fiumara, G. & Provetti, A. Generalized louvain method for community detection in large networks. 11th international conference on intelligent systems design and applications, 88-93 (2011).

  40. Bethke, J. H. et al. Environmental and genetic determinants of plasmid mobility in pathogenic Escherichia coli. Sci. Adv. 6, eaax3173 (2020).

    Google Scholar 

  41. Cooper, A. L., Wong, A., Tamber, S., Blais, B. W. & Carrillo, C. D. Analysis of antimicrobial resistance in bacterial pathogens recovered from food and human sources: insights from 639,087 bacterial whole-genome sequences in the NCBI Pathogen Detection database. Microorganisms 12, 709 (2024).

    Google Scholar 

  42. Alcock, B. P. et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Aacids Res. 51, D690–D699 (2023).

    Google Scholar 

  43. Avire, N. J., Whiley, H. & Ross, K. A review of Streptococcus pyogenes: public health risk factors, prevention and control. Pathogens 10, 248 (2021).

    Google Scholar 

  44. Higgins, D. A. et al. The major Vibrio cholerae autoinducer and its role in virulence factor production. Nature 450, 883–886 (2007).

    Google Scholar 

  45. Hazen, T. C., Fliermans, C. B., Hirsch, R. P. & Esch, G. W. Prevalence and distribution of Aeromonas hydrophila in the United States. Appl. Environ. Microbiol. 36, 731–738 (1978).

    Google Scholar 

  46. Salyers, A. A., Gupta, A. & Wang, Y. Human intestinal bacteria as reservoirs for antibiotic resistance genes. Trends Microbiol. 12, 412–416 (2004).

    Google Scholar 

  47. Jurėnas, D., Fraikin, N., Goormaghtigh, F. & Van Melderen, L. Biology and evolution of bacterial toxin–antitoxin systems. Nat. Rev. Microbiol. 20, 335–350 (2022).

    Google Scholar 

  48. Luo, X. et al. Characterization of DinJ-YafQ toxin–antitoxin module in Tetragenococcus halophilus: activity, interplay, and evolution. Appl. Microbiol. Biotechnol. 105, 3659–3672 (2021).

    Google Scholar 

  49. San Millan, A. et al. Small-plasmid-mediated antibiotic resistance is enhanced by increases in plasmid copy number and bacterial fitness. Antimicrob. Agents Chemother. 59, 3335–3341 (2015).

    Google Scholar 

  50. van Mastrigt, O., Lommers, M. M., de Vries, Y. C., Abee, T. & Smid, E. J. Dynamics in copy numbers of five plasmids of a dairy Lactococcus lactis strain under dairy-related conditions, including near-zero growth rates. Appl. Environ. Microbiol. 84, e00314–00318 (2018).

    Google Scholar 

  51. Uhlin, B. E., Molin, S., Gustafsson, P. & Nordström, K. Plasmids with temperature-dependent copy number for amplification of cloned genes and their products. Gene 6, 91–106 (1979).

    Google Scholar 

  52. Akasaka, N. et al. Change in the plasmid copy number in acetic acid bacteria in response to growth phase and acetic acid concentration. J. Biosci. Bioeng. 119, 661–668 (2015).

    Google Scholar 

  53. Saad, A. et al. Plasmid copy number variation impacts pathogenicity and quantification of Curtobacterium flaccumfaciens pv. flaccumfaciens infecting the mung bean. Plant Pathol. 74, 2670–2681 (2025).

    Google Scholar 

  54. Wedel, E. et al. Insertion sequences determine plasmid adaptation to new bacterial hosts. MBio 14, e03158–03122 (2023).

    Google Scholar 

  55. Peng, K. et al. Long-read metagenomic sequencing reveals that high-copy small plasmids shape the highly prevalent antibiotic resistance genes in the animal fecal microbiome. Sci. Total Environ. 893, 164585 (2023).

    Google Scholar 

  56. Potter, S. C. et al. HMMER web server: 2018 update. Nucleic acids Res. 46, W200–W204 (2018).

    Google Scholar 

  57. Ahlmann-Eltze, C. & Huber, W. Comparison of transformations for single-cell RNA-seq data. Nat. Methods 20, 665–672 (2023).

    Google Scholar 

  58. Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. (2012).

  59. Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 785–794 (2016).

  60. Rajewska, M., Wegrzyn, K. & Konieczny, I. AT-rich region and repeated sequences – the essential elements of replication origins of bacterial replicons. FEMS Microbiol. Rev. 36, 408–434 (2012).

    Google Scholar 

  61. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Google Scholar 

  62. Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).

    Google Scholar 

  63. Robertson, J. ames & Nash John H, E. MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microb. Genom. 4, e000206 (2018).

    Google Scholar 

  64. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).

    Google Scholar 

  65. Shahzadi, I. et al. Integrating theory and machine learning to reveal determinants of plasmid copy number. Github https://doi.org/10.5281/zenodo.19343249 (2026).

Download references

Acknowledgments

This study was supported by the National Key R&D Program of China (2024YFA0920200 to TW), the National Natural Science Foundation of China (32470701 and 12401660 to TW), and the Shenzhen Institute of Synthetic Biology Scientific Research Program (HSE499011086 to TW). We are grateful to the Shenzhen Infrastructure for Synthetic Biology for providing instrument support and technical assistance.

Author information

Authors and Affiliations

Authors

Contributions

I.S. and T.W. conceptualized the study. I.S. performed data analysis, developed the machine learning framework, prepared figures, and drafted the manuscript. H.U.U. contributed to data visualization. W.X., R.M., and L.Y. contributed to data processing and manuscript revision. T.W. supervised the study, acquired funding, provided conceptual guidance, and revised the manuscript.

Corresponding author

Correspondence to
Teng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Peer Review file (download PDF )

Reporting Summary (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shahzadi, I., Xue, W., Ubaid Ullah, H. et al. Integrating theory and machine learning to reveal determinants of plasmid copy number.
Nat Commun (2026). https://doi.org/10.1038/s41467-026-72303-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41467-026-72303-0


Source: Ecology - nature.com

Long-term trends and variability in sugarcane production: a five-district comparative analysis with meteorological context in Maharashtra and Karnataka, India

Microbial signatures define the ecosystem functions of the pelagic microbiome in a basin-scale, Southwest Atlantic Ocean

Back to Top