Data input
ANTs can process two types of data: (1) data representing the directed interactions of individuals (e.g. grooming) or their associations (e.g. proximity), and (2) data representing individual attributes (sex, age, dominance rank, etc.).
Interactions and association data can be input in the form of a matrix or a data frame(s). The data frame structure depends on the type of protocol the user wants to follow. For network permutations, data frames must be in an edge list format with at least two columns, one of which indicates the actor and the other the receiver. An additional column may indicate weights of interactions. These data frames allow the user to directly input data collected in the field. For the data stream permutation approach, data can be presented in data frame format. In this case, data frames are not edge lists because they contain additional information in extra columns. For data stream permutations concerning focal observations16, i.e. data obtained from following a specific individual over a certain amount of time23, two extra columns are required in addition to those indicating the givers and receivers of a behaviour: one indicating the focal individual and another indicating the corresponding focal session. For data stream permutations on group follow observations6, i.e. recording individual associations at specific locations and time, data frames have to be in a ‘linear mode’, identical to SOCPROG (i.e. in which each line corresponds to the observation of an individual) with additional columns indicating the different ‘control’ factors (see “Permutations” section) such as the date, time of day or the geographical location associated with the interaction occurred.
It is also possible to use data frames for individual attributes (sex, age, dominance rank, hormone levels, etc.). These must be in a data frame format, with a row for every individual present in the data of individual interactions or associations. Each line represents the attribute(s) of a single individual.
Inputting these two types of data (interactions/associations and individual attributes) may enable the user to (1) permute and/or compute network measures on data representing individuals’ interactions or associations and (2) store node network measures with ANTs functions in the data frame(s) of individual attributes. This makes it possible to study how these node network measures are related to individuals’ attributes.
When performing the multiple networks analytical protocol, the user has to create an R list object where each element of the list stores interaction/association data representing a single network (list of data of interactions or associations). This list must contain a unique data format of interactions/associations (i.e. only edge lists, associations of group follow or associations of focal sampling). Optionally, the user can create a second R list object with the attributes of the individuals present in the corresponding list of interactions/associations (e.g. the data frame of individual attributes in element 1 corresponds to the individuals present in the list of interactions/associations in element 1, etc.). This way, permutations are generated independently in each network (e.g. 1,000 permutations in network 1, 1,000 permutations in network 2, etc.).
Testing data collection robustness
One of the main issues with regard to social network analysis and the study of animal groups is the quality of data collection (time of observation), as observation biases (e.g. some individuals are more frequently observed than others) can generate unreliable statistical results24, 25. Usually, data collection protocol has to be planned for the needs of the intended SNA before collecting data. The following questions must be answered: Do I observe all group members equally? Am I using the best method to limit the disturbance of animal behaviour and interactions? The choice of observation period is also a key factor, as some interindividual associations or interactions are rare and/or difficult to observe over the short term but are still important to attain the objectives of the study. However, this not always the case as scientists often collect data before carrying out analyses. ANTs meets the needs of these differing approaches by offering two different protocols to assess data collection robustness:
- 1.
Lusseau, et al.24 protocol to assess the robustness of node measures through bootstrapping.
- 2.
Balasubramaniam, et al.25 protocol to assess the robustness of global measures through observation deletion simulations.
For further information on the use of these different protocols, please refer to ANTs R documentation concerning functions in the ‘sampling.’ family.
Controlling for time heterogeneity
It is sometimes difficult to obtain the same number of observations per individual. ANTs enables users to control for time heterogeneity in different ways through the use of different association indices, namely the generalised affiliation index, the simple ratio index, the half-weight index or the square root index6. For further instructions on the use of these different indices, please refer to ANTs R documentation concerning the functions in the ‘assoc.’ family.
Computing network measures
Three types of network measures can be identified depending on the level of organisation: global measures, polyadic measures, and node measures. In ANTs, all these measures are grouped under the function family ‘met’. All the node measures available in ANTs are synthesised in Table 1. The measures we proposed in the package ANTs are the ones commonly used in Animal Social Network Analyses6, 22, 26,27,28.
Global measures (e.g. network diameter) are used to study the overall network and obtain valuable information regarding network efficiency, resilience, clusterisation, etc. Polyadic measures (e.g. assortativity) allow the study of interaction patterns between individuals. These measures provide information about how individuals interact according to their attributes. Node measures (e.g. strength) are the most frequently used measures in animal research. Among other things, node measures inform users about the centrality of an individual, the number of alters it has and/or its activity according to individual attributes, and reveal patterns that are common to individuals with similar attributes. By giving access to global, polyadic and node measures, we aim to enable users to adopt a multilevel approach and thereby understand the centrality of individuals in a group, the patterns of interaction between them and the impact of these two levels on the global network structure22, 29 .
For more details on the different types of measures, their mathematical formula, interpretation, limitations and past use in animal research, see Whitehead6, Sueur, et al.26, Sosa, et al.22, Sosa29 and refer to ANTs R documentation .
Permutations
When considering data robustness, permutations can be used to avoid observation biases and ensure the reliability of results obtained by SNA (i.e. results that have no type I and type II errors). Indeed, with the exception of some specific cases such as experiments in social insects, where individuals may be tracked continuously, it is usually assumed when examining inter-individual interactions within a group or a population that neither all the interactions nor all individuals are observed, that the times of observation vary from one individual to another, and that the data collected are intrinsically dependent. For these reasons, permutation tests are needed to control for data independency before performing inferential statistical tests, as inferential statistical tests assume data independency16.
The Null Model (NM) approach via permutation is one of the many current possibilities to test statistical hypotheses15. It allows users to perform analyses by creating random data sets from the observed data. The observed measure of interest X (e.g. coefficient of correlation) is compared to a posterior distribution obtained from the random data sets, and assesses whether X is significantly different from the random distribution by calculating the proportion of random values that differ from the observed value. The NM approach can be applied in different ways. ANTs allows for this by adapting the permutations (pre- or network permutations) according to the type of data collected ( i.e. pre- or network permutations for data on associations and interactions respectively) and the research question (i.e. permuting nodes when examining individual network measures or permuting links when examining individual polyadic or global measures).
Data stream and node network permutations are two of the most commonly used permutation methods to build null models in animal social network analysis. A description of these methods is presented by Puga-Gonzalez et al. (submitted). Data stream permutations were initially used to test whether individuals in a social population have a preference for association with certain partners rather than with others27, 30. One of the advantages of this method is that it can control for different factors such as location. It is therefore possible to test whether non-random associations are due to individuals’ social preference or result from a preference for the same habitat or location27.
Node network permutation is the other commonly used method to test network-related hypotheses in animal research. Node permutations have mainly been used to compare two matrices (or networks) involving the same group of individuals, i.e. matrix correlations. In this case, the values entered in the cell of the matrices are (un)directed behaviours (e.g. grooming or playing). In contrast to the gambit of the group, (un)directed behaviours are usually collected via focal sampling, scan sampling, or ad libitum sampling23. During node permutations, the identity of the nodes is redistributed at each permutation whilst the node metric is kept constant. This allows users to test whether a specific network metric is associated with a specific node attribute (e.g. whether females groom more than males), or whether behaviours are reciprocated or directed to individuals with a specific trait (e.g. grooming directed up the dominance hierarchy). All of the permutation approaches available in ANTs are in the family function ‘perm’ with two subclasses, ‘perm.ds’ and ‘perm.net’ for data stream and network permutations, respectively. ANTs can perform data stream permutations for group follow and focal sampling data collection protocols. Network permutations can be performed on (1) node label(s) (with labels’ dependency maintained or not), (2) links, (3) link weights, and (4) link weights swap between categories. Among those different types of permutations, node label (ESM Appendix 1) and data stream (ESM Appendix 2) permutations are probably the most commonly used standard approaches in animal network analysis. For this reason, we developed a specific workflow to allow their use (ESM Appendix 1 and ESM Appendix 2) in ANTs for the study of single31 or multiple networks9, 13 (for network comparisons or time-aggregated analyses). To date, ANTs is the only software permitting the use of these approaches in an all-in-one environment and their application for the analysis of multiple networks.
For more details on the different permutations and their applications according to the data collection protocol, the type of behavioural data collected and the research question, see Bejder, et al.30, Whitehead27, Whitehead, et al.32, Croft, et al.28, Farine16,Momigliano, et al.33, Sosa29, ANTs R documentation, ESM Appendix 1 and ESM Appendix 2.
Statistical tests based on data permutations
All the statistical tests available in ANTs are in the family function ‘stat’. The available tests are correlation test ‘stat.cor’, t-test ‘stat.t’, Linear Model (LM) ‘stat.lm’, Generalised Linear Model (GLM) ‘stat.glm’, Generalised Linear Mixed Models (GLMMs) ‘stat.glmm’, assortativity test ‘stat.assortativity’, TaurK correlation ‘stat.Taurk’ and deletion simulation ‘stat.deletion’. ANTs stat. function returns an object with the posterior distribution of the variable tested.
- 1.
Once the permutation test has been performed, the function ‘ant’, allows the user to obtain the statistical results from any output object of any function ‘stat’. The ‘ant’ function returns a data frame with statistics specific to the type of statistical test run. However, some of these statistics are common to all tests, namely the P-values on the right or left of the distribution and the two-side p-values.
- 2.
Measures of the ‘effect size’ of the posterior distribution according to the statistics of interest: 95% confidence interval and the mean of the distribution a posteriori (see Farine and Whitehead34). The histograms of the post-distribution of the statistics of interest obtained from the permutations.
Network visualization
ANTs allows network visualisation with a data frame containing node information and a matrix of interactions/associations. Nodes and links can be parametrised to modify their size and colour and highlight differences (e.g. females showing higher eigenvectors than males). Network layouts are currently based on Barnes Hut repulsion, Hierarchical Repulsion and Force Atlas 2. For more details on network visualisation, see ANTs function ‘net.vis’ in the package instructions document. These layouts are commonly used in animal social network analyses9, 35,36,37 as for instance, Force Atlas 2 arranges the visualisation graph with the distance between nodes is inversely proportional to their association, giving a nice view of who is close to whom.
Source: Ecology - nature.com