in

Quantifying individual influence in leading-following behavior of Bechstein’s bats

Inferring leading-following networks

Defining leading-following events

Unlike studies on collective motion where group movement is tracked continuously5,15, our datasets contain only discrete records of bat appearances at experimental boxes. Quantifying individual influence is, thus, contingent on a rigorous method for inferring leading-following events from discrete recordings of animal occurrences. To denote the information that individuals possess about the location of experimental boxes, we refine the nomenclature used by Kerth and Reckardt3. An individual bat is said to be naïve at time ({{{mathbf {t}}}}_{{{mathbf {1}}}}) regarding a given box, if it has not been recorded by the reading device in that box for all times ({{mathbf {t}}}<{{{mathbf {t}}}}_{{{mathbf {1}}}}). Similarly, an individual bat is considered experienced at time ({{{mathbf {t}}}}_{{{mathbf {2}}}}) regarding a given box, if it has been recorded in that box at any previous time ({{mathbf {t}}}<{{{mathbf {t}}}}_{{{mathbf {2}}}}). We define a leading-following (L/F) event to a given box at time ({{{mathbf {t}}}}_{{{mathbf {3}}}}) as the joint visit of two individuals—one naïve and one experienced at time ({{{mathbf {t}}}}_{{{mathbf {3}}}}). The details of how joint arrivals are calculated are presented later in the paper.

In case more than two bats arrive jointly, we form all possible L/F pairs consisting of one naïve follower and one experienced leader. In case the leader and the follower were recorded multiple times, we take those times that minimize the difference between their appearances in the dataset (see Table S2 and associated explanation). Finally, we refer to time_difference of an L/F event as the absolute difference between the recording times of the leader and the follower.

With this definition of L/F events, the actual inference of L/F event patterns from the data relies on three parameters: (1) lf_delay: the maximum allowed time difference (in minutes) between consecutive recordings of a leader and a follower, (2) turnaround_time: the minimum time (in minutes) an experienced bat in an L/F event needs to potentially become a leader, i.e. the time needed to find and lead followers, and (3) occupation_deadline: the hour in the morning on the day of a box occupation, after which subsequent recordings from this box are ignored because of swarming behavior (example of local enhancement, Kerth et al.3, Kerth and Reckardt28).

Calibrating parameter values

It is important to note that each of the three parameters affects the inference of L/F events differently. Values of lf_delay that are too large would lead us to incorrectly define many visits of an experienced and naïve bats as joint visits, i.e. as legitimate L/F events, even those that occur in different days. Too small values of turnaround_time, on the other hand, will force us to “break” one L/F event with one leader and multiple followers into separate L/F events where the previous followers are now falsely deemed as leaders. Similarly, if occupation_deadline is too late in the morning, a lot of the joint visits due to swarming will be incorrectly inferred as L/F events. We discuss the parameter influence on the inference procedure in more detail in Section S.3.

To choose appropriate values for the three parameters we resort to a purely data-driven process based on comparing the distributions of L/F event time differences statistically. This represents a more rigorous approach compared to an otherwise subjective calibration based on observations often used in analysis of field studies.

Empirical research in the field of information transfer in Bechstein’s bats has suggested 3 min for lf_delay and 3 a.m. for occupation_deadline as a reasonable rule of thumb (Kerth and Reckardt 2003). We build upon these heuristics by (a) introducing an additional parameter turnaround_time defined above and (b) by comparing the distributions of time differences of all L/F events (Fig. 1).

Figure 1

L/F time differences for the GB2 colony in 2008. Histograms show the absolute differences between the times at which the leader and the follower were recorded in all identified L/F events. Parameters: turnaround_time = lf_delay = 3 min (both plots),occupation_deadline = 2 a.m. (left) and occupation_deadline = 3 a.m. (right). Insets indicates the total number of identified L/F events.

Full size image

Note that any combination of the three parameters is a 3-tuple, which generates a set of L/F time differences from all identified L/F events in the dataset. In Fig. 1 we show two-dimensional histograms of L/F time differences for fixed values of lf_delay = turnaround_time = 3 min, and occupation_deadline = 2 a.m. (left) and occupation_deadline = 3 a.m. (right). As there is no objective method to quantify the behaviour underlying each of the parameters, we argue that L/F time differences best capture the effect that varying the parameters has on the L/F events we identify. For example, a visual inspection of Fig. 1 hints that increasing occupation_deadline from 2 a.m. to 3 a.m. does not change the time difference distributions. This implies that swarming has not yet set in (otherwise, we would expect quantitatively more events with longer time difference), and the additional L/F events on the right-hand side are genuine. Consequently, we would prefer occupation_deadline = 3 a.m., as it increases our sample size. Section S.4 details the expected effects of swarming on the distributions of L/F time differences.

The core of our method revolves around pairwise testing for statistical difference in the distributions of L/F time differences generated by different values of the 3-parameter tuple. We start from the reasonable default values mentioned above and summarize the result of the calibration in Tables 2 and 3.

Table 2 GB 2 colony in 2008 with lf_delay = 5 min.

Full size table

To generate sufficient sample sizes for the comparison, the dataset we chose to analyze was the GB2 colony in 2008 (Table 1). The reason is that, in 2008, the colony had the highest number of discovered and occupied boxes, the second largest colony size, and a large amount of individual readings. Therefore, we expected to identify the largest number of L/F events from this dataset, and thus obtain the most robust parameter values.

In Table 2, lf_delay is fixed at 5 min, while occupation_deadline is varied in {2 a.m., 3 a.m., 5 a.m., 8 a.m.}, and turnaround_time – in {2, 3, 5, 7, 9} min. For each value of turnaround_time (rows in the table), we compare the time difference distributions (({mathcal {X}}_{i})/({mathcal {Y}}_{i})) between all possible pairs of occupation_deadline. The comparison is done via a bootstrapped Wilcoxon rank-sum test on the null hypothesis that the two distributions are the same, against the two-sided alternative ({mathcal {H}}_{1}), and the one-sided alternative ({mathcal {H}}_{2}) that ({mathcal {X}}_{i} < {mathcal {Y}}_{i}). Each table cell shows the p value for the two-sided and one-sided test, respectively.

As an example, fixing (texttt {turnaround_time}=2) min, we see that the distribution of L/F time differences for occupation_deadline at 2 a.m. is not statistically different from the distribution with occupation_deadline at 3 a.m. (p value = 0.725). This is an indication that the nature of the identified L/F events is invariant to the later deadline, hence it is unlikely that we have inadvertently included swarming effects. Further inspection of the table reveals that qualitative changes in L/F time differences occur when occupation_deadline = 8 a.m., but not for the other pair-wise comparisons. The one-sided test indicates the type of these changes, namely that L/F events inferred up to 8 a.m. on the day of occupation, tend to have larger time differences compared to earlier occupation deadlines. This is in line with the reasoning in Section S.4 of the Supplementary Material and implies the presence of swarming effects. Therefore, occupation_deadline = 8 a.m. is likely too late.

Moreover, this conclusion holds when varying turnaround_time, as well. The impact of this parameter on the L/F time differences seems to be small, in the range considered. The effect of turnaround_time is primarily on the number of identified L/F events, as assuming larger recruitment delays excludes events where the leader found a follower relatively quickly (Table 3).

Table 3 Number of identified L/F events for the GB2 colony in 2008 with different values of the three parameters.

Full size table

Considering these arguments, we see that lf_delay = 5 min, turnaround_time = 3 min. and occupation_deadline = 5 a.m. provide the best trade-off between maximising the number of identified L/F events while still keeping the distribution of L/F time difference undistorted by swarming. We also see these values as improvements over the common heuristics mentioned in the beginning of the section.

Constructing leading-following networks

With the parameters calibrated following the above procedure, we identified all L/F events in each of our datasets (that is 5 datasets for colony GB2 and 4 datasets for colony BS2 for all years, Table 1).

We then constructed directed and weighted leading-following (L/F) networks from each dataset. In these networks, a node represents an individual bat and a link between two nodes indicates their involvement in a leading-following event. More specifically, links are directed. A directed link from node A to node B, denoted as A (rightarrow) B, means that individual A followed individual B to a given experimental box. The weight of this directed link is the number of times that A followed B (to different experimental boxes) during the study period in the respective year. Note that in constructing these L/F networks, we ignore the target box of each L/F event and simply sum up the number of L/F events to compute the link weights.

We also compute the number of weakly connected (WCC) and strongly connected components (SCC). A WCC of a network is a sub-network in which any node can be reached from any other node, either by a link between these two nodes, or by following a sequence of links through other nodes, regardless of the direction of these links. Similarly, a SCC is a WCC with the additional restriction that the direction of the links must be respected when connecting any two nodes. As we explain in the next section, these two measures are particularly important for judging the extent to which information can spread in a network.

Social network analysis

Quantifying individual influence

We can now use the topology of the constructed networks, i.e. the relation between nodes expressed by their links, to characterize the position of individuals in such a network. Our aim is to identify those nodes, i.e. individual bats, that are most influential in leading other bats. In social network analysis, the importance, or influence, of a node in a certain dynamical process flowing through the network is referred to as centrality. There are various centrality measures in use, and each makes certain implicit assumptions about the dynamical process flowing through the network30. Choosing a centrality measure is, thus, context-dependent (see Fig. 2). An improperly selected centrality metric, can lead to losing the ability to interpret the measure correctly, this way deducing wrong answers.

Figure 2

Differences between the three candidate centrality measures. The centralities for each measure are indicated next to each node. (a) In-degree centrality. Here, only direct influence is measured. Individual 4 is most influential, as she spread information to three different individuals. Individuals 1, and 5 with one follower each, have still equal importance. (b) Eigenvector centrality. Since individuals 2 and 3 have no followers, they are attributed zero influence, and thus contribute nothing to the influence of their leader, individual 4. In turn, 1, 4, and 5, each have one follower of non-zero importance, hence they have the same eigenvector scores. (c) Second-degree centrality with (alpha=0.5.) Individual 4 has a higher centrality than her in-degree score, as we account for the indirect contribution of individual 1 ((3 + 0.5 times 1 = 3.5).) However, 5 is now more important than 1, because 4 contributes to 5 indirectly ((1+0.5 times 3 = 2.5).)

Full size image

In-degree, eigenvector and second-degree centrality

In our case, an appropriate centrality measure must reflect the notion of individual influence in spreading information about suitable roosts. If influence is best proxied by the total amount of roosts that a given bat made known to the colony, then a suitable centrality measure is the in-degree centrality (Fig. 2a). This quantity measures individual importance as the total number of bats that an experienced bat spreads information to directly, i.e. the number of L/F events in which an individual participated as a leader. In-degree centrality is, thus, calculated as the weighted sum of all directed links that point to a given experienced individual.

In-degree centrality measures the total number of leadings, i.e. direct influence, without considering how the information distributed by a leader to its followers propagates further through the colony. To also account for such indirect effects, an alternative centrality measure is eigenvector centrality (Fig. 2b). In a social network, a node has high eigenvector centrality if it is pointed to by nodes that themselves have high eigenvector centralities. In other words, an experienced bat leading a few bats, who themselves lead a lot can be more influential than a bat leading many other bats who in turn never lead. The computation of eigenvector centralities is presented in Section S.5 of the Supplementary Material.

The in-degree and eigenvector centralities represent two extremes, the former measuring exclusively direct influence, and the latter additionally measuring all possible indirect ways, in which information can flow from one individual to all the rest. Eigenvector centrality, however, considers information chains of all lengths to be of equal importance, regardless of the target experimental box. An information chain of length k is simply a sequence of k L/F events identified for a given network, in which the follower in a previous L/F event is the leader of a later L/F event.

For example, two identified L/F events A (rightarrow) B and C (rightarrow) A, constitute a chain of length two (in addition to forming two separate chains of length one). Assuming both L/F events were to the same experimental box, then B ought to obtain direct importance from having led A, but also indirect contribution, for were it not to B, A would not have learned about this box and thus could not lead C to it. This assumption is not entirely correct, however, since it is possible, though unknowable, that A would have found the roost by its own exploration, or that A “forgot” the information obtained from B, and re-visited the box before leading C. The latter issue is exacerbated with the length of the event chains we consider. However, if the two L/F events were not to the same target box, B should not obtain any indirect benefits from the second L/F event. Note that in-degree centrality only considers L/F chains of lenth 1, i.e. direct influence.

Since we construct aggregated L/F networks (i.e. links represent leading-following, disregarding the target box), we risk attributing too much importance to individuals when using eigenvector centrality. The metric will simply grow with the length of the chain and individuals who are part of longer chains will tend to be quantified as more influential. This would risk distorting individuals’ influence scores, since it is highly unlikely that a long L/F chain had the same target box for all L/F events in the chain.

We again use our data to inform the proper balance between direct and indirect influence, and thus to choose the right centrality metric. Figure 3 shows the relative frequency, aggregated over all datasets, of observing chains of L/F events. This frequency can be interpreted as the probability of finding chains of a given length. As the inset in Fig. 3 demonstrates, the probability distribution resembles an exponential distribution. The plot further indicates that chains longer than 16 did not occur in any of the datasets we have. More importantly, event chains of length up to two constitute more than 80% of all lengths observed, and the probability of longer chains decreases drastically. We, thus, posit that limiting the influence computation to L/F chains of maximum length of two reflects properly the majority pattern observed in the data. Hence, it represents a proper heuristic that minimizes the risk of inflating indirect influence scores by including long L/F chains that do not represent genuine information spread about the same roost.

Figure 3

Probability distribution of the lengths of L/F event chains, calculated over all nine datasets. Inset: log-linear plot of the data.

Full size image

For this reason, we define a new metric—second-degree centrality (Fig. 2c)—which computes centrality as the in-degree of the focal individual and the sum of the in-degrees of its followers, weighted by a factor (alpha) (in that sense the followers of one’s followers are its second-degree followers). This reflects our observation that chains of length up to two constitute the majority in all datasets. We, thus, use second-degree centrality with (alpha =0.5) as the main measure for quantifying individual influence. We will, however, keep in-degree and eigenvector centrality for comparison purposes to make sure that our heuristic metric does indeed reflect a balance between the two extrema and produces consistent results.

All analysis was done in the R programming language.

Ethics approval

Handling and tagging of the bats were conducted under the permits for species protection (55.1-8642.01-2/00) and animal welfare (54-2531.01-56/06; 55.2-2531.01-79/10; 55.2-2531.01-47/11) that had been issued by the government of Lower Franconia.


Source: Ecology - nature.com

MIT convenes influential industry leaders in the fight against climate change

How will Covid-19 ultimately impact climate change?