Robust bacterial co-occurence community structures are independent of r- and K-selection history

Selection-switch experiment

The dataset used for this article is previously published¹⁴, but we include a brief summary for completeness: Natural seawater was collected and used to inoculate microcosms in a 2 × 2 factorial crossover design with 3 replicates conducted for 50 days, which were sampled 18 times during the experiment. Half of the microcosms were given high (H) resource supply, whereas the other half were given low (L) resource supply. The factor of resource supply level was constant throughout the experiment. The other factor was the selection regime, which meant that the microcosms were either given continuous supply of nutrients (favouring K-selection, and hence the designation K) or being pulse-fed with nutrients after diluting the contents of the microcosms with growth medium (favouring r-selection, designated R). The active selection regime was switched at the experimental halfway point (between days 28 and 29), yielding two selection groups designated as RK and KR.

DNA was extracted from the collected samples, and the V3-V4 region of the bacterial 16S-rRNA gene was amplified with PCR using broad-coverage primers and the index sequences were ligated. The amplicon library was pooled and sequenced with two runs on an Illumina MiSeq machine. The reads are available at the European Nucleotide Archive with accession number ERS7182426-ERS7182513.

The USEARCH pipeline⁴⁷ (v11) was used to remove low-quality reads and cluster the reads into OTUs at 97% similarity level. Finally, the taxonomy of the OTUs was determined by the Sintax classifier using data from the RPD training set (v.16) where the confidence threshold was set to 80%.

Quantification of bacterial density

For each sample, the bacterial density was quantified using flow cytometry (BC Accuri C6)¹⁴. In brief, the bacterial communities were diluted in 0.1x TE buffer, mixed with 2x SYBR Green II RNA gel stain (ThermoFisher Scientific) and incubated in the dark at room temperature for 15 minutes. Then, each sample was measured for 2.5 minutes at 35 μL min⁻¹ with an FL1-H (533/30 nm) threshold of 3000. We gated the bacterial population as those events with an FL1-A (> 10^4) and FSC-A (< 10^5). The raw flow cytometry data files are available at https://doi.org/10.6084/m9.figshare.15104409.

Alignment and phylogentic tree

The selection-switch dataset was acquired directly from the authors¹⁴. This dataset consists of a total of 206 samples. Two of these samples were taken from the communities from which the reactors were inoculated, whereas the other samples were taken from the microcosms with 17 time points x 4 regimes x 3 replicates. We discarded the inoculum samples for further analysis. The OTU reference sequences were aligned with SINA version 1.6.1⁴⁸ using the SILVA Release 138 NR 99 SSU dataset⁴⁹. Using this aligment, the phylogentic tree was constructed by neighbour-joining using MEGA X⁵⁰ with default parameters.

Filtering and preprocessing

The mean number of reads per sample was 63,460 with standard deviation 31,411. For our analysis, we wanted to estimate the abundance of each OTU as accurately as possible and therefore skipped any correction for unequal sequencing depth. Read counts for each OTU in each sample were divided by the total number of reads for the sample, generating relative abundances. Thereafter, all OTUs having a maximum abundance (across all samples) below a certain threshold, were removed. Three levels of filtering thresholds (as count proportions) were applied: High level at ( 5cdot 10^{-3} ), medium level at ( 1cdot 10^{-3} ) and low level at ( 5cdot 10^{-4}). The purpose of the filtering was to remove rare OTUs in order to avoid noise and spurious correlations¹¹. For obtaining estimates of absolute abundances, the relative abundances were scaled by the estimate of total bacterial cell density for each sample. The phyloseq package (version 1.36.0)⁵¹ and the R programming language (version 4.1.1)⁵² facilitated this procedure. In addition, we wrote an R-package named micInt (version 0.18.0, available at https://github.com/AlmaasLab/micInt) to facilitate and provide a pipeline for the analysis.

Similarity measures and addition of noise

For this study, we used two similarity measures, the Pearson correlation and the Spearman correlation. A similarity measure, as referred to in this article, can be thought of as a function (f: mathbb {R}^ntimes mathbb {R}^n rightarrow D) where ( D = [-1,1] ). In this regard, (fleft( {mathbf {x}},{mathbf {y}}right) ) is the similarity of two abundance vectors ( {mathbf {x}} ) and ({mathbf {y}}) belonging to different OTUs, where (fleft( {mathbf {x}},{mathbf {y}}right) = 1) indicates perfect correlation, (fleft( {mathbf {x}},{mathbf {y}}right) = 0) indicates no correlation and (fleft( {mathbf {x}},{mathbf {y}}right) = -1) indicates perfect negative correlation. Noise was added to distort patterns of double zeros, which otherwise could result in spurious correlations. Given two vectors ( {mathbf {x}} ) and ( {mathbf {y}} ) of abundances, normally distributed noise was added to each of the abundance vectors, and the similarity measure has invoked thereafter: Given a similarity measure f, the similarity between the abundance vectors after adding noise is given by:

$$begin{aligned} f^*left( {mathbf {x}},{mathbf {y}}right) =fleft( {mathbf {x}} +varvec{varepsilon _x},{mathbf {y}}+varvec{varepsilon _y }right) , end{aligned}$$

(1)

where (varvec{varepsilon _x}) and ( varvec{varepsilon _y} ) are random vector where all components are independent and normally distributed with mean zero and variance ( gamma ^2 ). The level of noise ( gamma ) was determined by the smallest non-zero relative abundance ( x_{mathrm {min}} ) in the dataset and a fixed constant s called the magnitude factor, such that ( gamma = scdot x_{mathrm {min}}). For no noise, ( s=0 ), for low noise ( s=1 ), for middle noise ( s=10 ) and for high noise ( s=100 ).

Network creation

Significance of the pairwise OTU associations were determined by the ReBoot procedure introduced by Faust et al.²² and shares the underlying algorithm used in the CoNet Cytoscape package⁵³. This approach accepts a dataset of microbial abundances and a similarity measure, and evaluates for each pair of OTUs in the dataset the null hypothesis ( H_0 ): “The association between the OTUs is caused by chance”. By bootstrapping over the samples, the similarity score of each pair of OTUs is estimated, forming a bootstrap distribution. By randomly permuting the pairwise abundances of OTUs and finding the pairwise similarity scores, a bootstrap distribution is formed. The bootstrap and permutation distribution are then compared with a two-sided Z-test (based on the normal distribution) to evaluate whether the difference is statistically significant. For this, the z-value, p-value and q-value (calculated by the Benjamini-Hochberg-Yekutieli procedure⁵⁴) are provided for each pair of OTUs in the dataset. Our ReBoot approach is based on the R-package ccrepe (version 1.28.0)⁵⁵, but is integrated into the micInt package with the following major changes:

The original ReBoot uses renormalization of the permuted abundances to keep the sum-to-constant constraint. Whereas this is reasonable to do with relative abundances, our modified version enables turning this feature off when we analyse data with absolute abundances.
Optimizations have been made to memory use and CPU consumption to enable analyses of large datasets.
In contrast to the usual ReBoot procedure, networks generated by the different similarity measures are not merged by p-value, but kept as they are.

For our analysis the number of bootstrap and permutation iterations was set to 1000. All OTUs being absent in more than ( ncdot 10^{-frac{4}{n}} ) samples, where n is the total number of samples, were excluded through the errthresh argument but still kept for renormalization (if turned on). The associations were made across all samples, even the ones belonging to a different selection group or resource supply.

Dynamic PCoA visualization

All samples in the dataset were used for PCoA ordination, where the Bray-Curtis distance metric between the samples was applied before creating the decomposition. After the ordination was computed, the samples were divided into four facets based on their combination of current selection regime and resource supply. Finally, all samples belonging to the same microcosm were connected by a line in chronological order and the line was given a separate style based on the resource supply and coloured to visually distinguish it from the two other replicate microcosm within the same facet.

Permutational multivariate analysis of variance

Sequential PERmutational Multivariate Analysis of VAriance (PERMANOVA) of the samples was conducted on the absolute abundances, where only the samples from day 28 and 50 were included. These sample points correspond to time just before the experimental selection-regime crossover and a point at the end of the experiment. These days were selected because they were the most likely to capture the composition of stable communities in contrast to transient ones. The procedure was carried out by the function adonis from the R package vegan (version 2.5-7) with ( 10^6 ) permutations. The dependent data given to the function was the matrix of one minus the Spearman correlation of the samples (in order to resample dissimilarity), while the independent variables were the selection group (first variable) and the current selection regime (second variable).

Network visualization

The networks were plotted by the R package igraph (version 1.2.6)⁵⁶. Network modules were found by the walktrap²⁵ algorithm implemented in igraph with the setting steps=20, including the positive edges only. Later, the negative edges were added and the networks plotted with the community labelling.

The time dynamics of the networks were visualised by taking the former network and adjusting the node colour and size, as well as the edge colour. For this, a certain combination of selection group (i.e RK) and resource supply (i.e H) was chosen. Further, let (x_{i,j,k} ) be the abundance of OTU k at sampling day i in microcosm j. As there are three replicates, we have that ( j= 1,2,3). If the underlying network was created by Pearson correlation, we denote the day mean ( x_{i,.,k} ) as the average over the replicates, this is:

$$begin{aligned} x_{i,.,k}= frac{x_{i,1,k}+x_{i,2,k}+x_{i,3,k}}{3}. end{aligned}$$

(2)

The time series mean of OTU k, (x_{.,.,k} ) is the mean of these daily means over all sampling days,

$$begin{aligned} x_{.,.,k} = frac{sum _{i=1}^{N}x_{i,.,k}}{N}, end{aligned}$$

(3)

where N denotes the number of sampling days. Furthermore, we have the associated standard deviation (sigma _k) as given by:

$$begin{aligned} sigma _k =sqrt{ frac{1}{N}sum _{i=1}^{N}left( x_{i,.,k}-x_{.,.,k}right) ^2}. end{aligned}$$

(4)

The z-value of the abundance of OTU k at day i is then:

$$begin{aligned} z_{i,k} = frac{x_{i,.,k}-x_{.,.,k}}{sigma _k}. end{aligned}$$

(5)

This value is used in the mapping of the node sizes and colours. The node for OTU k at sampling day i has the size ( a+bcdot left| z_{i,k}right| ), where a and b are constants. Furthermore, the same node is coloured:

Black if ( z_{i,k} < -1 ). This indicates that the OTU that day had a lower abundance than the average.
Grey if (-1 le z_{i,k} le 1 ). This indicates that the OTU that day had about the same abundance as the average.
Orange if ( z_{i,k} > 1 ). This indicates that the OTU that day had a higher abundance than the average.

Furthermore, the edge colour are dependent on the product of the two participating nodes. Hence, the edge between OTU k and OTU l at day i will have the colour:

Red if ( z_{i,k}cdot z_{i,l} < -0.3 ). This shows a contribution to a negative interaction.
Gray if (-0.3 le z_{i,k}cdot z_{i,l} le 0.3 ). This shows no major contribution of neither a positive nor negative interaction.
Blue if (z_{i,k}cdot z_{i,l} > 0.3 ). This shows a contribution to a positive interaction.

Our approach is motivated by the fact that the Pearson correlation ( rho _{k,l} ) of the day means of OTU k and OTU l is given by:

$$begin{aligned} rho _{k,l} = frac{1}{N} sum _{i=1}^{N} z_{i,k}cdot z_{i,l}. end{aligned}$$

(6)

For the Spearman correlation, the visualization is based on the rank of each of the OTU abundance values in a sample. Hence, instead of using the raw abundances ( x_{i,j,k} ) in the calculation of the day mean, the ranks ( r_{i,j,k} ) are used instead, and all subsequent calculations and mappings are the same. In a scenario when there is only one replicate, the quantity ( rho _{k,l} ) would then be the Spearman correlation of the abundances of OTU k and OTU l.

Source: Ecology - nature.com