in

Metagenome-assembled genome extraction and analysis from microbiomes using KBase

  • Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173–179 (2015).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Tully, B. J. & Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci. Data 5, 170203 (2018).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Stewart, R. D. et al. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat. Commun. 9, 870 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography and lifestyle. Cell 176, 649–662 (2019).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Nayfach, S. et al. A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 39, 499–509, https://doi.org/10.1038/s41587-020-0718-6 (2021).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Gilbert, J. A., Jansson, J. K. & Knight, R. The Earth Microbiome project: successes and aspirations. BMC Biol 12, 69 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Saheb Kashaf, S., Almeida, A., Segre, J. A. & Finn, R. D. Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat. Protoc. 16, 2520–2541 (2021).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Chong, J., Liu, P., Zhou, G. & Xia, J. Using MicrobiomeAnalyst for comprehensive statistical, functional, and meta-analysis of microbiome data. Nat. Protoc. 15, 799–821 (2020).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Arkin, A. P. et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat. Biotechnol. 36, 566–569 (2018).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 49, D10–D17 (2021).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Kluyver, T., et al. Jupyter Notebooks – a publishing format for reproducible computational workflows. In: Loizides F, Schmidt B, editors. Positioning and Power in Academic Publishing: Players, Agents and Agendas. p. 87–90 (2016).

  • Banfield, J. Development of a Knowledgebase to Integrate, Analyze, Distribute, and Visualize Microbial Community Systems Biology Data. (2015). Report number: DOE-UCB-4918, OSTI ID: 1167269.

  • Chen, I.-M. A. et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res 47, D666–D677 (2019).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Afgan, E. et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 44, W3–W10 (2016).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Devisetty, U. K., Kennedy, K., Sarando, P., Merchant, N. & Lyons, E. Bringing your tools to CyVerse discovery environment using Docker. F1000Res. 5, 1442 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, L., Lu, Z., Van Buren, P. & Ware, D. SciApps: a bioinformatics workflow platform powered by XSEDE and CyVerse. in Proceedings of the Practice and Experience on Advanced Research Computing 1–5 (Association for Computing Machinery, 2018).

  • Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat. Microbiol. 6, 3–6 (2021).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Wattam, A. R. et al. Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center. Nucleic Acids Res 45, D535–D542 (2017).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 48, D570–D578 (2020).

    PubMed 
    CAS 

    Google Scholar 

  • Wu, Y.-W. et al. Ionic liquids impact the bioenergy feedstock-degrading microbiome and transcription of enzymes relevant to polysaccharide hydrolysis. mSystems 1, e00120–16 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rajeev, L. et al. Dynamic cyanobacterial response to hydration and dehydration in a desert biological soil crust. ISME J 7, 2178–2191 (2013).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Foster, I. Globus Online: accelerating and democratizing science through cloud-based services. IEEE Internet Comput 15, 70–73 (2011).

    Article 

    Google Scholar 

  • Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res 27, 824–834 (2017).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Zhang, H. et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 46, W95–W101 (2018).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2019).

    PubMed Central 

    Google Scholar 

  • Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinforma 10, 421 (2009).

    Article 

    Google Scholar 

  • Nordberg, H. et al. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res 42, D26–D31 (2014).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).

    Article 

    Google Scholar 

  • Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Freitas, T. A. K., Li, P.-E., Scholz, M. B. & Chain, P. S. G. Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Res 43, e69 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10, 2014 (2019).

    Article 

    Google Scholar 

  • Youngblut, N. D. & Ley, R. E. Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets. Peer J 9, e12198 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ondov, B. D., Bergman, N. H. & Phillippy, A. M. Interactive metagenomic visualization in a Web browser. BMC Bioinform 12, 385 (2011).

    Article 

    Google Scholar 

  • Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Orakov, A. et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol 22, 178 (2021).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 3, 836–843 (2018).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25, 1043–1055 (2015).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Delcher, A. L., Salzberg, S. L. & Phillippy, A. M. Using MUMmer to identify similar regions in large sequence sets. Curr. Protoc. Bioinform. Chapter 10, Unit 10.3 (2003).

    Google Scholar 

  • Darling, A. C. E., Mau, B., Blattner, F. R. & Perna, N. T. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14, 1394–1403 (2004).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res 50, D785–D794 (2022).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Brettin, T. et al. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci. Rep. 5, 8365 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Overbeek, R. et al. The SEED and the rapid annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res 42, D206–D214 (2014).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11, 119 (2010).

    Article 

    Google Scholar 

  • Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Rinke, C. et al. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat. Microbiol. 6, 946–959 (2021).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Haft, D. H. et al. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res 46, D851–D860 (2018).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res 48, 8883–8900 (2020).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43, D261–D269 (2015). (Database Issue).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res 47, D427–D432 (2019).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Haft, D. H. et al. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Res 41, D387–D395 (2013). (Database issue).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Eddy, S. R. Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 (2011).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42, D490–D495 (2014).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Chivian, D., Dehal, P. S., Keller, K. & Arkin, A. P. MetaMicrobesOnline: phylogenomic analysis of microbial communities. Nucleic Acids Res 41, D648–D654 (2013).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Karaoz, U. & Brodie, E. L. microTrait: a toolset for a trait-based representation of microbial genomes. Front. Bioinform. https://doi.org/10.3389/fbinf.2022.918853 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wood-Charlson, E. M. et al. The National Microbiome Data Collaborative: enabling microbiome science. Nat. Rev. Microbiol. 18, 313–314 (2020).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Hofmeyr, S. et al. Terabase-scale metagenome coassembly with MetaHipMer. Sci. Rep. 10, 10689 (2020).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Kolmogorov, M. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17, 1103–1110 (2020).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27, 722–736 (2017).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Bertrand, D. et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat. Biotechnol. 37, 937–944 (2019).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Chen, L.-X. et al. Accurate and complete genomes from metagenomes. Genome Res 30, 315–333 (2020).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Lui, L. M., Nielsen, T. N. & Arkin, A. P. A method for achieving complete microbial genomes and improving bins from metagenomics data. PLoS Comput Biol 17, e1008972 (2021).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Miller, C. S., Baker, B. J., Thomas, B. C., Singer, S. W. & Banfield, J. F. EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biol 12, R44 (2011).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Chivian, D. et al. Genome extraction from shotgun metagenome sequence data. KBase n/33233/628 https://doi.org/10.25982/33233.606/1831502 (2022).

    Article 

    Google Scholar 

  • Chivian, D., et al. Moab desert crust – sample 4E. KBase n/62384/334 (2022). https://doi.org/10.25982/62384.253/1831503

  • Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Matsen, F. A., Kodner, R. B. & Armbrust, E. V. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform 11, 538 (2010).

    Article 

    Google Scholar 

  • Benson, D. A. et al. GenBank. Nucleic Acids Res 46, D41–D47 (2018).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Teiling, C. BaseSpace: Simplifying metagenomic analysis. 26th European Congress of Clinical Microbiology and Infectious Diseases (2016) 10.26226/morressier.56d5ba2ed462b80296c9509d

  • Reich, M. et al. The GenePattern notebook environment. Cell Syst 5, 149–151.e1 (2017).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 158 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Karp, P. D. et al. A comparison of microbial genome web portals. Front. Microbiol. 10, 208 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Yue, Y. et al. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinform 21, 334 (2020).

    Article 
    CAS 

    Google Scholar 

  • Nelson, W. C., Tully, B. J. & Mobberley, J. M. Biases in genome reconstruction from metagenomic data. PeerJ 8, e10119 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 11, 2864–2868 (2017).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Li, L., Stoeckert, C. J. Jr & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178–2189 (2003).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792–1797 (2004).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kumari, S. et al. A KBase case study on genome-wide transcriptomics and plant primary metabolism in response to drought stress in sorghum. Curr. Plant Biol. 28, 100229 (2021).

    Article 
    CAS 

    Google Scholar 

  • Seaver, S. M. D. et al. The ModelSEED biochemistry database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes. Nucleic Acids Res 49, D575–D588 (2021).

    Article 
    PubMed 
    CAS 

    Google Scholar 

  • Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 

  • Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).

    Article 
    PubMed 
    PubMed Central 
    CAS 

    Google Scholar 


  • Source: Ecology - nature.com

    Crop diversification and parasitic weed abundance: a global meta-analysis

    With new heat treatment, 3D-printed metals can withstand extreme conditions