Benthic ecosystem cascade effects in Antarctica using Bayesian network inference

The data for this study were collected in the austral summer of 2016, on board the BAS research ship RRS James Clark Ross³⁸. Biological abundance data was taken from photographic images from Brasier et al.¹⁰ (see methodology within). The images were taken with a Shallow Underwater Camera System (SUCS), in transects of 10 photographs, 10 m apart. Replicant transects were separated by 100 m, at water depths of 500 m, 750 m and 1000 m¹⁰. Each photograph in the analysis was 0.51 m² in area. In the analyses, ‘VME unknown’ was biological material unidentifiable to phylum, but distinguishable as VME taxa, e.g., branched or budding fragments which could be bryozoan or cnidarian species. The percentage cover of encrusting species was recorded by Brasier et al.¹⁰, as the number of individual colonies was not always possible to distinguish between colonising taxa¹⁰. Physical variables, such as substrate texture, were also observed from the SUCS images¹⁰.

Taxonomic resolution of the data originally collected was dependent upon the ability to distinguish the organism in the images¹⁰. For our analysis, the taxonomic hierarchy analysed related to the abundance level of that group, as lower abundance, finer scale taxonomic groupings would be zero-inflated so not able to be used within our methodology (cf. Milns et al.³⁹). Taxonomic groupings included were Annelida, Arthropoda, Cnidaria, Echinodermata, Mollusca, Bryozoa, Actinopteri, Porifera and Echinodermata. The echinoderm taxa Ophiuroides and Euryalids also occurred in large enough numbers to be analysed as separate groups. The taxa in each of these groupings can be seen in the Brasier et al.¹⁰

The raw data (taxon ID from photographs) was highly zero-inflated (84.9% of entries), so data grouping was performed to capture the fine-scale (individual photographs at ∼cm scale) and large-scale (replicate transects at ∼km scale) ecology of the ecosystems. For the fine-scale network there were 527 samples (abundances from individual photographs) and twelve nodes. Four were physical nodes (Depth, Region, Substrate and Substrate Texture, Table 1), two were functions of the specimens present (percent encrusting), five were taxa classified to Phylum level (Arthopoda, Bryozoa, Porifera, Cnidaria and Echinodermata) and there was one bin-group (VME Unknown) Supplementary Table 2.

Table 1 Table of network properties for the two networks found. Chains are defined as series of dependencies containing more than two nodes. Link density is mean number of connections per node, connectance is the number of connections/number of nodes*(number of nodes-1).

Full size table

For the large-scale network, the data from all the photographs in each event (up to three replicate transects) was combined to form 21 samples with 16 nodes. For factor nodes (Region, Texture and Substrate) the modal variable was taken, taxa nodes were summed, and the mean was used for Percent Encrusting and Depth. Texture was excluded from analyses due to the high zero-count. Two nodes were functions of the specimens present (percent encrusting), ten taxa were analysed (Annelid, Arthropod, Cnidarian, Echinoderm, Mollusc, Bryozoan, Actinopteri, Ophiuroidea, Euryalida, Porifera) and one bin-group (VME Unknown) was used. The combination of the two networks enabled the investigation of both physical and biotic interactions across multiple spatial scales.

We chose these relatively coarse taxonomic groupings in order to maximise the statistical power of our analyses. This statistical power enabled us to reconstruct a complex network of dependencies, thus revealing the subtle relationships between different taxa. The coarse grouping is a limitation of this method, because the homogenisation of some diverse groups, such as echinoderms, groups together organisms with variable life modes and traits (and therefore differing relationships with other taxa). However, the coarse level of identification was necessary with this volume of data (over 500 photographs), and the patchiness and rarity of many taxa. Future studies involving a higher density of photographs will allow for genus or species level separation, and/or an analysis based upon functional traits classification.

Analysis

One approach to understanding how ecosystems function is to consider the ecosystem as a network (cf. Miln et al.³⁸). Different taxa or groups of taxa are considered ‘nodes’ and their interactions are described as ‘connections’ (more usually known as ‘edges’ in Baysian networks), which link interacting taxa together. A node that the connections feed into are called ‘parents’. Work has centred on gene regulatory networks¹⁷, neural information flow networks, and with more recent applications to ecological and palaeoecological networks^18,39,40,41. It is important to note that the structure produced by the BNIA reflects the associations caused by co-localisations (two taxa which both have a high abundance), not by a specific interaction, for example predation. By using BNIA, direct dependencies between taxa can be detected, minimising auto-correlation between two nodes. For example, if A depends on B which depends on C, there could be a correlation between A and C. However, this correlation would not represent an interaction or association between A and C, merely the two correlations between A and B and B and C. BNIA enables only the realised dependencies to be found, ensuring only actual interactions and associations are found.

Bayesian network inference was performed in Banjo¹⁶, The BNIA software used was Banjo v2.0.0, a publicly available Java based algorithm^39,41. Banjo uses uniform priors, so boundaries for the different discretized groups were chosen to ensure even splitting between the groups. Discretized data were input into Banjo, which then generated a random network based on the input variables. A greedy search was then performed to find a more likely network than the random one generated. This search was repeated 10 million times for each set of input data and the most probable network was then output. The maximum number of parents was set to 3 to limit artefacts¹⁹.

The BNIA used require discrete data, which ensures data noise is masked, and only the relative densities of each taxon are important³⁹. We split the data into three intervals; zero counts, low counts and high counts. Low counts consisted of counts below the median for the taxa group and high counts were counts over the median. Medians were used over means because for some groups the high counts were very high, and would result in a very small number of samples grouped in the highest interval (cf. Milns et al.³⁹). A large amount of bins maintains the amount of information present in the dataset, while fewer bins provide more statistical power, and greater noise masking. Yu¹⁹ has shown that for ecological data sets three different bins is a good balance. Zero was treated as a separate entity because the presence of one individual is very different to a zero presence, in contrast to zero gene expression, for example. Data preparation for Banjo (grouping and discretization) was carried out in R⁴², as was the statistical analysis of the data. Further analysis of banjo outputs, when required, used the functional language Haskell⁴³. The scripts are available on Github (github.com/egmitchell/bootstrap).

To minimise outliers bias, 100 samples were bootstrapped at 95% level by randomly selecting 95% of the total number of samples for each analysis^20,43, and then finding the subsample network using Banjo. For each connection calculated, the probability of occurrence was calculated, and the resultant distributions analysed to find the number of Gaussian sub-distributions using normal mixture models⁴⁴. This probability distribution was bimodal for each data set, which suggests that there were two distributions of connections, those with low probability of occurrence, and those with highly probable connections. The final network for each area was taken to be those connections which were highly probable. The threshold for being labelled ‘highly probable’ depended on the network (as determined by the normal mixture modelling analyses): 53% for fine-scale network and 51% for the large-scale network. The magnitude of the occurrence rate is indicated in the network by the width of the line depicting the connection.

The direction of the connection between nodes indicates which node (taxon) has a dependency on the other node (taxon). For each connection, the directionality was taken to be the direction that occurred in the majority of bootstrapped networks. Where there was no majority (directional connections have a probability between 0.4 and 0.6), the connection was said to have bi-directionality, or indicated a mutual dependency.

The IS can be used to gauge the type and strength of the interaction between two nodes. If the IS = 1, this corresponds with a positive correlation. When node 1 is high, node 2 will be high. An IS of −1 corresponds to a negative correlation: High node 1 corresponds to a low node 2. An IS = 0 does not mean there is no correlation between the two nodes. IS = 0 means that the interaction is non-monotonic. Sometimes node 1 will be positively correlated with node 2, sometimes negatively. The mean IS for each connection was calculated for each sample area.

Contingency test filtering

In order to avoid Type I errors introduced by high zero counts, which is common in ecological data sets, we excluded rare taxa, which were found in under 33% of the grid cells. Note, that this method of exclusion could potentially mean that a taxon with high abundance in a very limited area is excluded from analyses. To further guard against Type I errors, we also used a method of contingency test filtering that removed from consideration a connection between two variables whose joint distribution showed no evidence of deviation from the distribution expected from their combined marginal distributions (chi-squared tests, p > 0.25)³⁹. This threshold was used to ensure no chance of removing truly dependent dependencies, so that only artefacts such as those found between high zero counts were removed from consideration. These links were provided to the BNIA to exclude from consideration.

Inference

Inferring how one node (taxa or physical variable) is likely to change given another node being in a given state is done by calculating the probability of node A being in a given state given node B is in a given state. All nodes that have dependencies between A and B and so are included in the calculation:

$${mathrm{P}}left( {{mathrm{A|B}}} right) = mathop {sum}limits_{n = 1}^{n = N – 1} {frac{{mathop {prod }nolimits_{s = 1}^{s = S} {mathrm{P}}({mathrm{B}}_n|A_{n + 1})}}{{mathop {sum}nolimits_{m = 1}^N {P(B|A_m)P(A_m)} }}}$$

The N are the total number of nodes in the chain and n and m are the indices for the chain of nodes of length N connecting the first and end nodes. S are the number of discrete states for each node, which are indexed s. In order to infer the likely change of one taxon’s (A) abundance on another’s abundance (B), the probabilities of all taxa that occur in the network between A and B have to be taken into account. For example, for the probability that B is in a Zero abundance state given A is in a high abundance state, and given that B are connected through a dependency of B on C and C on A, the probability is calculated as follows: the probabilities of C existing in all states is calculated given a High abundance state for A, and then the probabilities for B existing in a Zero state is calculated for each state of C and then summed together to give the probability of B in Zero given A in high. To work out the inferred change in B given a change in A from High to Zero would involve taking the difference between the probabilities for each state of B given A is high with each state of B given A is low. For the code used to generate these inferred probabilities, please see ref. ⁴⁵.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: Ecology - nature.com

Benthic ecosystem cascade effects in Antarctica using Bayesian network inference

Analysis

Contingency test filtering

Inference

Reporting summary

Georgina Mace (1953–2020)

Designing off-grid refrigeration technologies for crop storage in Kenya

ITALIAN LANGUAGE

ENGLISH LANGUAGE