Diversity and compositional changes in the gut microbiota of wild and captive vertebrates: a meta-analysis
Literature search and data retrievalWe performed a systematic literature search on the internet (Google Scholar, Web of Science) using the following keywords: [gut microbiota], [animal microbiome], [gut microbiome 16S] and [captive AND wild AND microbiota]. This search yielded 222 articles on animal microbiomes published between 2014 and 2020. The materials and methods of these articles were analysed to ascertain whether the study met the following criteria: (i) all wild and captive samples were processed using identical procedures, (ii) compared wild and captive animals were phylogenetically closely related (members of the same species or species complex), (iii) captive individuals were born in captivity, or no information was provided about the origin of the captive animals; i.e., wild animals brought into captivity and sampled some time later were excluded, (iv) captive animals that underwent a deliberate selection process (e.g. inbred mice or domestic animals) were also excluded for considering them genetically not comparable to the wild counterparts, and (v) only datasets with sample sizes over 12 individuals were considered for analysis. Raw data were extracted from the databases and repositories indicated in the articles (accession numbers listed in the “Bioinformatic resources”).Bioinformatic sequencing data processingDatafiles from the different studies were (i) stored at the University of Copenhagen’s Electronic Research Data Repository (ERDA), (ii) assigned a unique study identifier and (iii) re-processed in the Danish National Supercomputer for Life Sciences ‘Computerome2’ using a new bioinformatic pipeline we developed for processing data with different characteristics, including sequencing mode, read length and 16S rRNA gene fragment. The entire code can be found in the “Bioinformatic resources”. In short, for each individual dataset, we quality-filtered (mean phred score of q = 25) and (if necessary) trimmed and merged the paired-end reads based on the sequence overlap using AdapterRemoval224. Primers (if present) were trimmed using Cutadapt25, and reads were dereplicated with USEARCH Derep26 using a relative minimum copy number threshold of 0.01% of the total sequencing depth. Reads were then converted into zero-ratio OTUs using the denoising algorithm UNOISE327, and USEARCH was used to map the reads back to the OTUs and create an OTU table. HS-Blast28 was used to assign taxonomy against the non-redundant Silva 132 database29, and taxonomic assignments were filtered using different identity thresholds for each taxonomic level: 97% for genus-level taxonomy, 95% for family-level taxonomy, 92% for order-level taxonomy and 90% for higher taxonomic levels30. To minimise the impact of incorrectly assigned taxa, taxonomic annotations below these identity thresholds were converted into unclassified, and not considered in downstream analyses. This pipeline yielded relative read abundances assigned to different taxa for each individual dataset analysed.Data quality filteringIndividual data files generated by the aforementioned pipeline were aggregated by study and host species into genus-level abundance tables. The two datasets of Sarcophilus harrisii retrieved from two different studies were processed independently. Taxonomic resolution was limited to the genus level to maximise taxonomic annotation rate and minimise biases introduced by the different 16S rRNA gene markers employed in the analysed studies. On the one hand, wild animals’ microbial communities often contain taxa that do not map to any catalogued species with enough molecular similarity to assign species-level annotation. On the other hand, the analysed datasets were generated based on the V4, V3–V4 and V1–V3 regions of the 16S rRNA gene (Supplementary Dataset), which hindered comparability at the ASV or zOTU level. We then proceeded to quality-filter the genus-level abundance tables of each species through filtering individuals by minimum sequencing depth, minimum diversity coverage and taxonomic annotation. Only individual datasets with more than 1000 reads and diversity coverage values over 99% were retained, and final genus-level abundance tables that contained at least five animals in each contrasting group were considered for analysis. Since the studied datasets contained traces of dietary items and host DNA, read counts assigned to taxonomic groups not assigned to Bacteria genera, or not present in the LTPs132_SSU release of the SILVA Living Tree (https://www.arb-silva.de/projects/living-tree) used for measuring the phylogenetic relationships among bacteria, were removed to ensure accurate measurements of phylogenetic diversities. In the cases where one group (either wild or captive) outnumbered the other, samples were randomly selected to ensure even sample sizes.Diversity and compositional analysesDiversity and compositional analyses were carried out in the R statistical environment v.3.6.331 and Python 3.8 based on the Hill numbers framework. The operations explained below were conducted using the R packages ape32, dendextend33, dmetar34, hilldiv35, meta36, metamicrobiomeR37, phylosignal38, phytools39, treedist40, vegan41, and the python package qdiv42. Hereafter functions and their respective packages are displayed as ‘package::function’. Statistical significance level was set at a FDR-adjusted p-value of 0.05. All charts and figures in the manuscript were originally generated either in R (full code of all original figures is included in “Bioinformatic resources”) and subsequently modified in Adobe Illustrator to achieve the desired layout without distorting the dimensions of the quantitative elements.Hill numbersThe Hill numbers framework encompasses the group of diversity measures that quantify diversity in units of equivalent numbers of equally abundant taxa43,44—in our context bacteria genera. Hill numbers provide a general statistical framework that is sufficiently robust and flexible to address a wide range of scientific questions that molecular ecologists regularly try to answer through measurement, estimation, partitioning and comparison of diversities45. To obtain a complete vision of the gut microbiome differences between wild and captive animals, we conducted all our diversity and compositional analyses based on three contrasting Hill numbers based metrics: the so-called dR, which only accounts for richness (i.e., order of diversity 0, whether bacteria taxa were present or not), dRE which considered Richness + Evenness of order of diversity 1 (i.e., the relative abundances of bacteria are proportionally weighed) and dRER, which considered Richness, + Evenness + Regularity (i.e., the phylogenetic relationships among bacteria are accounted for). Detailed explanations of these metrics can be found elsewhere17,46,47.Phylogenetic treesThe dRER metric required a Bacterial phylogenetic tree to compute the relatedness among bacterial taxa. As our datasets contained different fragments of the 16S rRNA gene, and thus we were unable to generate a phylogenetic tree directly from our DNA sequence data, we relied on the SILVA Living Tree, and used the LTPs132_SSU release as the reference phylogenetic tree. In addition, the time-calibrated host phylogeny required by the host phylogenetic signal and phylosymbiosis analyses was generated using Timetree48.Diversity metrics and meta-analysisWe computed individual-based diversity metrics using the function hilldiv::hill_div, and obtained average alpha diversity metrics per species, as well as wild and captive populations per species. We used a Kruskal–Wallis (KW) test as implemented in the function hilldiv::div_test to ascertain whether the mean diversity values varied across analysed host species, and a PERMANOVA (PMV) test using vegan::adonis function based on the pairwise dissimilarity matrix to test whether host species were compositionally distinct.Average alpha diversity metrics of wild and captive populations per species were used to conduct a random-effects-model (REM) meta-analysis with raw effect sizes using the function meta::metacont. We used the Sidik–Jonkman estimator for the between-study variance and the Knapp–Hartung–Sidik–Jonkman adjustment method. The overall effect was calculated using Hedge’s g (SMD) and its 95% confidence interval and p-value. An identical analysis was performed for the entire dataset and two representative subsets of five species, containing only datasets derived from primates and cetartiodactylans. Higgin’s and Thompson’s I2 test, Tau-squared T2 and Cochran’s Q were used for quantifying the heterogeneity between the included species. Due to the high heterogeneity found in the study, we evaluated whether the between-study heterogeneity was caused by outliers with extreme effect sizes, which could be distorting our overall effect. We defined an outlier if the species’s confidence interval did not overlap with the confidence interval of the pooled effect using dmetar::find.outliers function.The function detected three outliers in dR metric (GOGO, PEMA and TUTR), four in dRE (GOGO, PEMA, MOCH, EQKI) and seven in dRER (RHBR, PYNE, PEMA, TUTR, MOCH, CENI and AIME). Even when these outliers were excluded from the analysis the I2 heterogeneity value was substantial for dR (I2 from 79.3 to 70.3%) and moderate for dRE (I2 from 80.1 to 60.0%) and dRER (I2 from 86.9 to 54.2%) and significant for both (Cochran’s Q, p-value More