Different sampling strategies yield different microbial communities
The sampling strategies compared in this study (homogenizing tissue before subsampling and homogenizing tissue after subsampling) are common methods found in the literature for characterizing plant-associated microbial communities14,23,26,29. Procrustes analyses and community overlap between sampling strategies demonstrated that different strategies can capture disparate microbial communities within plants, with the extent of these differences depending on the community targeted and plant tissue type sampled. In FFE as well as bacterial and non-AM fungal communities in roots, subsamples from the same plant resulted in completely different sets of species recovered, illustrating the severe undersampling that is inherent to each of these strategies. With these sampling strategies, we are undoubtedly sacrificing power and accuracy to characterize the subtler aspects of plant microbiome interactions, despite often seeing community differences across landscapes, treatments or seasons.
Richness was higher when homogenizing before subsampling for bacteria only, despite differences observed in composition for all groups. It is perhaps surprising that homogenizing plant tissues before subsampling did not recover more species than homogenizing after subsampling for fungi as well, because with the former approach, more plant tissue is initially represented. Indeed, a previous study showed that sample pooling or homogenizing before subsampling resulted in a higher richness of soil fungi compared to equally sized individual samples50. In Song et al. (2015)50 they also found that multiple individual subsamples, rather than the single homogenized subsample, resulted in higher richness. This may suggest that the scale at which we are physically able to break down the particle size of plant tissues, as opposed to soil, is not always fine enough to sufficiently homogenize the fungi within. Because of this, plant-associated microbial communities may require a greater sampling effort than soil microbes. Additionally, the removal of low-abundant SVs did not result in differences in richness between the two sampling strategies for any microbial group, suggesting that neither strategy is better at capturing rare species. Although this study was performed only on milkweed plants, we believe that these results are applicable to other plant species as well. The richness reported here is similar to other studies of plant-associated microbes (e.g.51,52), indicating that differences in subsamples were not due to extreme richness of milkweed-associated microbes.
Microbial diversity should inform sampling effort
The higher congruency that we saw between sampling strategies for AMF compared to other microbial communities may be due to the differences in their local and global estimated richness. While the global number of AMF species has been estimated in the hundreds to low thousands42,53, global estimates of fungal species in general range in the millions54,55. A recent global estimate of bacterial richness suggests similar scales56. In this study specifically, AMF had the lowest total SV richness and the greatest similarity between sampling strategies, while foliar fungal endophytes had the highest SV richness, and the lowest overlap of SVs between strategies. Since the amount of tissue sampled was equal for all microbial communities, the sampling effort was likely much higher for AMF (relative to the whole AMF community), than it was for bacteria and non-AM fungi. Consequently, with each sample we are likely sampling a much larger proportion of true AMF species richness.
Even though the estimated total community richness was highest for foliar fungi, the average estimated richness per individual plant was highest for AMF. This suggests that similar AMF SVs re-occurred across all plants with low species turnover. On the other hand, fungi in leaves had lower average richness per plant (Fig. 4, Supplementary Fig. S3), but the highest total richness, meaning that there was higher turnover of FFE species among plants sampled. These results may be a direct reflection of the overall community richness of the different microbial groups as well as their ability to spread and co-occur within plants. Based on these patterns, more individual plants and a greater sampling effort within individuals are likely needed to characterize FFE communities compared to AMF communities.
Rare SVs contribute to variation among subsamples
Our results show that low abundant, rare SVs largely contributed to the differences seen between sampling strategies. Even AMF communities, which were already similar, increased in overlap by 50% between strategies after low abundant SVs (represented by < 0.05% of sequences, Table 1) were removed. Microbial community distributions are often characterized by long tails of low-abundant species15, and as such, the likelihood of resampling rare species in each replicate can be low. In one study, Zhou et al. (2011)57 randomly sampled a simulated community with an exponential distribution. They observed only a 53% overlap between two samples when sampling just 1% of that community. We see even more extreme differences in overlap in this study, where initial sampling effort is also low relative to the whole microbial community.
The importance of rare microbes may vary and is easily overlooked in favor of highly abundant, and perhaps more influential fungi or bacteria. However, due to the compositional nature inherent to amplicon data, those SVs that appear to be in low abundance at the time of sampling may only be relatively so. Also, we do not yet fully understand microbial species turnover or succession. Plant-associated microbial communities can change significantly in just a matter of months58, or even weeks59. In addition, the exact relationship between sequence number and biomass of a species is variable60, and there is little evidence, if any, that sequence number is in direct proportion with a species’ impact in an ecosystem. Some microbes may be more metabolically active than others, despite appearing to be present in smaller quantities61. The recovery of the rare microbial community is arguably just as vital as the recovery of species that appear more abundant.
Bioinformatics pipelines that artificially inflate the number of SVs, especially low abundant or rare SVs, could potentially inflate the differences we see among community subsamples. Hundreds of bioinformatics approaches have been used to analyze amplicon data, and no consensus exists on which is best. However, a recent study comparing the performance of 360 different software and parameter combinations showed that DADA2 (which is what we used here), with no other filter other than the removal of low quality and chimeric sequences, was best for recovering true richness and composition from a mock fungal community of 189 different strains62. If anything, DADA2 can erroneously lump closely related species41, which would make it more conservative than other methods used. However, in an effort not to overestimate the true variation between strategies compared in this study, we assessed the relative importance of rare taxa through the gradual removal of lesser-abundant sequences, and we also used LULU, which is sometimes employed to reduce artifactual diversity48. We also removed all SVs that could not be confidently assigned to known microbial taxa. Even with these approaches, substantial variation remained due to the inherent undersampling of the strategies compared.
Severe undersampling obscures subtle community variation
With the possible exception of AMF, none of the sampling effort curves approached an asymptote meaning that both sampling strategies failed to adequately characterize the microbial communities present within a single plant or plant community (Supplementary Fig. S2). We found that multiple replicates from a single plant can vary by nearly 80% in FFE SVs, even when extracting from larger amounts (250 mg vs. 30 mg) of tissue. Lindahl et al. (2013)63 suggests that if duplicate subsamples differ much in community composition then these differences threaten to obscure finer-scale treatment effects and ecological correlations, and that sampling effort should be increased. Indeed, a more robust sampling effort through the use of multiple technical replicates revealed remarkably strong (R2 = 0.91), and significant host filtering within each individual plant that would have gone unobserved if extracting DNA from just a single replicate per plant. Although we may be able to observe patterns in under sampled data among sites or treatments, it is difficult to train models and make predictions or inferences in regard to the larger microbial population.
As Unterseher et al. (2011)15 suggests, it is often unnecessary to saturate richness in microbial communities, but this should be carefully considered before developing experiments and testing hypotheses. One must take into account the objectives of the study and the accuracy and precision required to meet those objectives. Although the methods traditionally employed to sample plant-associated microbes may be sufficient to generally observe landscape-scale differences, it is important to recognize that we are not characterizing these communities, rather we are taking a sliver of a ‘snapshot’ of species composition from a single point in time. A large proportion of true microbial diversity for most systems will likely still remain undetected and the specific results may be limited in their replicability.
Summary and recommendations
Although it used to be common practice, multiple studies now suggest that duplicate and triplicate PCR reactions are unnecessary for fungi and bacteria64,65. However, based on the results of this study, we recommend the inclusion of a different kind of technical replicate (i.e., multiple extraction reps from a single plant), in addition to biological replicates (multiple plants in a single population), especially when studying factors that may generate subtler differences in plant associated microbial communities. We show that extracting DNA from the standard 25–30 mg (dry weight) per plant can result in microbial communities that vary by as much as 100% and extractions of 250 mg from a single plant can vary by as much as 79%. The need for increased replication is particularly important if site, treatment, or seasonal differences may be obscured by other environmental drivers. Striving for a more comprehensive understanding of the depth and structure of plant microbiomes and their response to their surrounding environments will help us to better understand the exact functions of plant–microbe associations and how we might manipulate plant microbiomes in order to reduce disease or increase plant productivity in the future.
Sample size and sampling effort have surpassed sequencing depth and cultivation as the bottleneck when characterizing plant-associated microbial communities. A good sampling design is essential to approximate underlying patterns in microbial community composition in a reproducible manner and both sampling effort and size should be clearly justified. Schloss (2018)20 elaborates on the concern of replicability and reproducibility with the growing use of Illumina-based studies of microbial communities, and describes PCR bias, sequencing errors, and cryptic or poorly described bioinformatics as preventing data from being generalizable to other environments. Undersampling and poor to absent descriptions of sampling effort and strategy also contribute to this problem, and the current frequency of undersampling should be concerning. The differences we see here between sampling strategies and the extreme variation among replicates suggest that many studies of plant-associated microbial communities may not be sufficiently replicable or reproducible.
Due to variation in community structure among AMF, bacteria, and non-AM fungi, standardizing a sampling protocol for all organisms is difficult, and best practices will, to some degree, depend on the specific organism targeted, richness, and site. Since neither sampling approach appeared to outperform the other, in many studies the overall sampling effort may be of greater importance. For example, when investigating landscape-scale differences in abundant or species poor microorganisms, a smaller sampling effort is often sufficient. However, we suggest that more diverse plant-associated microbial communities, such as foliar fungal endophytes and root-associated bacteria, necessitate a more robust sampling effort than what is currently practiced in the literature. Per sample richness, relative to the estimated total community richness should always be considered when determining the optimum sampling strategy for any system. For example, sampling strategies and volumes sufficient for sampling AMF communities in extreme environments are likely not adequate for sampling fungal endophytes in the tropics where richness is high66. The need for increased sampling effort is especially pertinent if noise associated with sites, treatments or sample processing may potentially obscure the differences among them. In studies that fail to see differences in microbial communities among sites or treatments, sampling effort should always be examined as a potential impediment.
In summary, we recommend that: (1) authors provide more transparent, detailed sampling information as well as sequencing and sampling effort curves, (2) sampling effort is not arbitrary, but is adjusted based on the diversity of plant microbiomes and per sample richness relative to total community richness (both of which may require preliminary sampling), and (3) authors consider increased sampling effort when investigating smaller-scale drivers of microbial communities such as host filtering or subtle gradients, or when attempting to truly characterize microbial communities. Finally, controlling for the amount of plant tissue sampled both before and after homogenization, although not tested here, may be the most optimal strategy for reducing potential bias. Standardizing how we sample plant-associated microbial communities as suggested by Dickie et al. (2018)19, or at the very least, insisting on more robust and transparent sampling strategies, will allow for more accurate and comprehensive analyses as well as better cross-study comparisons in the future.
Source: Ecology - nature.com