Intolerant baboons avoid observer proximity, creating biased inter-individual association patterns

All research methods included in this study were performed in accordance with the relevant guidelines and regulations, under ZA/LP/81996 research permit, with ethical approval from the Animal Welfare Ethical Review Board (AWERB) at Durham University. The authors confirm the study was carried out in compliance with ARRIVE guidelines.

All inter-individual association data was collected between June 2018 and June 2019 on a wild habituated group of Afro-montane chacma baboons in the western Soutpansberg Mountains, South Africa (central coordinates S29.44031°, E23.02217°) (for study site description see²). The study group was habituated circa 2005 and was the focus of intermittent research attention until 2014. The study area experienced long-term anthropogenic activities (local farming, forestry, and residences) prior to 2005, as such, consistent interactions with humans have been ongoing with this population for some time. From 2007 onwards numerous researchers were able to collect expansive datasets on the study group (e.g. Refs.^17,18), indicating that habituation was at a typical level found elsewhere (also validated by AA and RH, who had researched chacma baboons elsewhere). From 2014 the group received full day (dawn until dusk) follows 3–4 days a week, with occasional gaps of up to 5 weeks in duration. These gaps did not appear to effect habituation levels, likely due to the presence of other researchers at the field site who always tried to act benignly when encountering the habituated group. The follow schedule was designed so that the study group retained as much of their natural interactions with predators as possible by ensuring the baboons spent significant time without observers who may influence the frequency and nature of predator–prey interactions¹⁹.

The study site was located in a private nature reserve and the study group was not hunted during observation gaps or engaged in any conflict with humans, other than occasionally being scared (chasing, yelling, throwing stones etc.) from a small plantation by local workers, usually resulting in alarm barks and fleeing responses. However, the study group appeared adept at recognising the differences between researchers and these threats²⁰. The majority of the study group’s home-range typically overlapped with the core area of the Lajuma Research Centre, and as a result, interactions with staff living in the area, unfamiliar researchers, and tourists were frequent. However, the baboons had not engaged in ‘raiding’ residences, threatening humans, or any other potentially negative symptom of habituation before the end of this study.

Sampling methodology for proximity associations

30-s focal sampling was used to collect proximity associations between all group members (excluding infants). All data was collected between June and December 2018 and January and June 2019; the majority of 2018s data was collected during the wet season, whilst most of 2019s data was collected during the dry season. To account for time of day, each day was split into four time-periods that were seasonally adjusted ensuring each period accounted for 25% of the current day length. A randomly ordered list of individuals was produced for each day, the first individual identified from the top 15 (approx. 20% of group size) individuals on the list was sampled immediately. Individuals could only be sampled once per time period per day, and a maximum of twice total per day. All individuals received at least 14 focal observations per time period (56 total) across the study period (see below for how we handled uneven sampling for some individuals). A video camera was used by AA (the only observer to collect this data) to record all focal observations (Panasonic HC-W580 Camcorder). At the end of the 30-s focal observation the identities of all neighbouring conspecifics within 5 m, 2.5 m, 1 m, and touching the focal animal were recorded (audibly by AA). We chose the end of the focal observation to record this data as this was most likely to reflect the conditions during the focal, i.e., the observer had been in proximity for at least 30 s.

Neighbour information was extracted from video footage and entered manually by AA and AW. Data was split into separate years to reflect an observation gap of several weeks and to understand whether there was consistency in the hypothesized effects through time and to reflect underlying differences in environmental conditions during the two study periods; during the dry season fruits and seeds are scarce and day lengths are several hours shorter than in the wet season such that day journey lengths are often shorter in the dry season and animals are much more sedentary which could impact inter-individual spacings. In 2018 each individual was sampled between 28 and 30 times; 28 focals were randomly selected from each individual to make sampling even. For 2019 there were between 25 and 27 focals per individual; 25 samples of each individual were randomly selected. Observations were undertaken at a range of distances. For both years the median end observer distance was 4.5 m; data was thus split into close focal observations of less than or equal to 4.5 m (2018: n = 918, 2019: n = 809), and observations greater than 4.5 m (2018: n = 902 2019: n = 816). See supporting information Table S1 for summary statistics of the observation distances of each individual.

We did not make any attempt to record our focal data evenly across the various habitats at our field site (see Supporting information text S1 for complete habitat descriptions) as our previous research indicated there was little difference in general spatial cohesion/inter-individual proximity patterns across these habitats (see Supporting information text S2 and Table S2). As a result, we considered it unlikely that there were fundamental differences in inter-individual association patterns across habitats, or that observers struggled to reliably detect or identify neighbours in dense habitats. We do acknowledge, however, that there will always be an element of bias with such methods, as observations were avoided, aborted, or excluded if visual obstructions (e.g., cliffs, rocks, walls, buildings, very dense vegetation etc.) prohibited accurate assessments; the observations used in the current study are from occasions when these factors were not an issue.

During this study the group contained between 85 and 92 individuals. Age-sex class was defined according to secondary sexual characteristics (e.g., testes descending/enlarging, sexual swelling, canine eruption) and changes in pelage throughout juvenile development (see Supporting information text S3 for full descriptions). All 65 non-infant individuals that were present during 2017 (when displacement tolerances were calculated) and still remaining in the group by the end of 2019 were used in this study (4 individuals from the prior FID study were no longer present). There were a high number of births between 2018 and 2019, but none were independent by the time either of our sampling periods begun in 2018 or 2019. There was no immigration of foreign individuals, but two individuals disappeared, both during the 2018 focal sampling period. As a result, we had a very consistent pool of individuals to sample from during this study. We removed all data associated with the two individuals who disappeared as their occurrences as neighbours would have been poorly sampled (due to missing more than half the study) relative to the rest of the group which would have led to statistical biases²¹.

Flight initiation distance procedure

Individual displacement tolerance estimates were previously quantified in our previous research² using a flight initiation distance (FID) procedure²² that was completed between October 2017 and April 2018, prior and independent to the commencement of proximity association focal sampling in June 2018. Individual baboons were approached by an observer, and the distance at which the animal displaced away from the observer measured (see Supporting information Table S2 for summary statistics). This procedure was repeated 24 times for each individual baboon, with approaches spread evenly across two observers differing in familiarity. At the beginning of each approach we also recorded several behavioural, social, and environmental factors that could have hypothetically influenced an individual’s FID² including whether the animal was engaged (e.g., digging or grooming) or not engaged (e.g., resting, chewing food, being groomed), habitat type (open/closed: see Supporting text S1), whether the animal was on the ground or sat on a low branch or rock within 50 cm of the ground, the number of conspecifics within 5 m of the focal animal, and whether there had been any external events within the preceding 5 min (e.g., alarm calls, aggressions, encountering another group or predator). During the approach, we also recorded the visual orientation distance (the distance at which the focal animal directed its line of vision towards the head of the approaching observer) and whether one of the focal animal’s neighbours had displaced/fled before the focal animal. Although all but neighbour flee first and external events showed some importance for predicting looking (see Table S4), FID was found to be distinct amongst individuals and repeatable within each individual, evidence that displacement tolerance may be an individual level trait². Full details of methods, statistical analysis, and results (including comparison to the original model) for this updated model are in Supporting information text S4, with model summary results for the previous and updated models in Tables S3 and S4.

The notion of an observer approaching a habituated primate may be considered atypical or likely to result in habituation/sensitization effects or agonistic behaviours being directed towards the approaching observers. However, our previous study² showed that almost all approaches resulted in the animal passively relocating (98.85%), a very benign response identical to the behaviours of subordinate baboons displacing away from dominant conspecifics. This suggests that in this group, observers may be considered equivalent to a high-level social threat². Throughout observation periods on habituated animals, observers are likely to approach or displace animals either incidentally or accidentally multiple times throughout the day, especially during lengthy focal observations. As such, the approach methodology is unlikely to represent a stimulus outside of the norm for our study animals. This may explain why displacement responses were so passive and why there was no evidence of habituation or sensitization effects across the group or individually through a range of temporal periods² or after life-threatening events²⁰. As a result, our situation was possible without risk of causing stress or anxiety in the study subjects, eliciting agonistic behaviours towards observers, or interfering with their prior habituation levels.

Statistical analysis

Influence of tolerance and observer distance on inter-individual association patterns

Quantifying displacement tolerance

To quantify displacement tolerance towards observers we extracted the individual conditional modes from the updated FID model using the ranef function in brms. Conditional modes are often referred to as Best Linear Unbiased Predictors (BLUPs) and are the difference between the predicted mean population-level response for a given set of treatments (i.e., population-level effects) and the predicted responses for each individual, and therefore infer the extent to which each individual differs from the population mean. The conditional modes and their associated standard deviations can be found in supporting information Table S5.

To validate that the conditional modes from the updated model were both representative of the individual’s flight responses and in line with the estimates produced from our previous study² we performed additional tests. Firstly, we performed a Pearson’s correlation between the conditional modes from the updated model and the conditional modes from the previous article. Individual tolerance estimates were consistent (r(63) = 0.915, p < 0.001) despite the changes in model structure from². Additionally, the intraclass correlation coefficient remained almost identical (updated individual identity ICC: 0.60; highest density intervals (HDI) for posterior samples at 95% intervals, 0.50, 0.70) to the findings reported in our previous research² (ICC, 0.65; HDI, 0.56, 0.74). Finally, we also explored the correlation between the conditional modes of the updated FID model and each individual’s mean and median FIDs as raw measures have also been used to estimate individual traits, e.g., time spent inspecting a novel item for boldness²³ and proportion of days trapped for trappability²⁴. We found high correlation between the conditional modes and each individual’s mean (r(63) = 0.861, p < 0.001) and median (r(63) = 0.847, p < 0.001)) FID values.

To ensure the data collected from the 2017/2018 FID approaches were applicable to focal data collected throughout 2018 and 2019 we tested a further 15 individuals (approximately 25% of the group) beginning two weeks after the proximity focals were completed (late June 2019). We tested the effect of year on individual tolerance estimates and found them to be consistent across years (see supporting information text S5 for details). Thus, we considered the flight initiation distance data collected during 2017/18² to be applicable to our current study.

Mixed model analysis investigating the interaction between individual tolerance and observer distance on individual occurrences as a neighbour

The inter-individual association data were coded as a count of how often each individual occurred as a neighbour during the focal observations of the remaining (n-1) group members. Each individual had separate counts for each observer distance (i.e., over/under 4.5 m) within each proximity buffer (i.e., touch, 1 m, 2.5 m, and 5 m), separately for each year (i.e., 2018 and 2019). These counts were the response variables in four generalised linear mixed effects models (separate models for each proximity buffer). In all models, the count of an individual’s occurrences as a neighbour was predicted by the interaction between observer distance, individual tolerance estimate (conditional modes from updated FID model), and year. Tolerance was on the spectrum whereby low/negative values indicated low tolerance and high (positive) values indicated high tolerance. We specified that all combinations of the fixed effects and their two-way interactions were also modelled.

Although using conditional modes without their associated error is not ideal (see²⁵ for discussion), the alternative approaches were not viable for our data. Firstly, measurement error models produced divergent transitions with very low effective sample sizes rendering inference from the models unreliable. Secondly, multivariate approaches were exceptionally challenging to initialise and fit; in our case as the bivariate model with dual response variables of FID and individual occurrences as neighbour drew from two datasets derived from independent methodologies requiring distinct model structures and response families. As a result, despite our conditional modes losing the error around the point estimates, our models using these estimates were the most robust we could fit and draw inference from. Previous research has also consistently used this approach (without the associated error) in personality studies as response variables^26,27, population-level/fixed effects^28,29, or as variables in correlations and principal component analysis³⁰. However, we do acknowledge that losing the associated error from the conditional modes may contribute to anticonservative estimates in the models analysing occurrence patterns.

For all models we specified the interaction between tolerance and observer distance as a random slope over individual identity, with correlated intercepts. As with the population-level effects, we also allowed the interaction to model each covariate within the interaction as separate slopes over individual identity, whilst year was also included as a separate random slope over individual identity to ensure we estimated the within individual and between individual differences (in occurrences as a neighbour) due to sampling year. We did not model this as a 3-way interaction due to model performance issues; however, the tolerance and observer distance interaction was vital to ensure we effectively estimated the differences (in occurrences as a neighbour) due to between individual tolerance differences, the individual slopes of observation distance, and the between individual differences due to the interaction between tolerance and observation distance.

We did not centre or scale any variables as year and observer distance were categorical and tolerance represented each individual’s mean difference to the population mean. We also calculated how many focal samples it was possible for each individual to appear in (as a neighbour) as each individual was focal sampled differentially. For example, there were 888 focal samples (of the n-1 other individuals) that individual ARL could have occurred as neighbour in 2018 with the observer over 4.5 m away, whilst there were 886 for individual ATH for the same year and observer distance due to ATH receiving two more focal observations than ARL in these contexts. We included the natural log of this variable as an offset term to account for this differential sampling. All models were fit with a Bayesian procedure using the brm function³¹ in the R software³². Each model was run for four Hamiltonian Markov chains for 15,000 iterations, with warmup iterations set to 5000, and adapt_delta set to 0.95 all these parameters were set higher than default. A Poisson response distribution was defined, with default link functions. Default improper flat priors were assigned to all population-level effects and Student t default priors (df = 3, mean = 0, scaling factor = 2.5) assigned to the remaining model components³¹. However, in the case of the standard deviations of group-level (i.e., random) effects these parameters are constrained to be positive and therefore half Student-t priors were implemented.

Model fits were assessed via graphical checks, firstly by examining trace plots to ensure convergence/good mixing of multiple chains, and secondly, by comparing our observed data to data simulated from the posterior predictive distribution of our models using the pp_check and pp_stat functions (brms and bayesplot packages). In each case the simulated data from our models generated data that captured the vast majority of values in our observed response distribution with no signs of dispersion issues^31,33. To further assess that there were no dispersion issues, we followed³⁴ and fit each proximity buffer model (i.e., 5 m, 2.5 m, 1 m, and Touch response variables) with a negative binomial response distribution and compared each model to their respective Poisson models using leave-one-out cross-validation (LOO-CV). Although the models performed similarly, in all cases the Poisson models produced higher expected log-predictive densities than the negative binomial models, indicating higher predictive accuracy and therefore confirming that there were no dispersion issues with any of our Poisson models³⁴.

The outputs of each model also showed that all Rhat values were equal to 1, indicating accuracy of the response variable with regard to the Poisson response distribution and excellent chain convergence. In addition, the bulk effective sample sizes (ESS) were considerably above the minimum advised threshold (100 times the number of chains), with tail ESS also being well sampled. Together these factors highlight that the model produced high estimation accuracy, including at the tails of the distribution³¹. Finally, we checked for multicollinearity between variables using the check_collinearity function from the Performance package³⁵, and all predictors produced variance inflation factors of less than 4, strong evidence that multicollinearity was not an issue.

Relying purely on model estimates and 95% credible intervals for inference can be subjective and overlooks the extent to which the posterior is equivalent to zero³⁶. Therefore, to ensure as much information about the posterior was included in inference, we additionally calculated the 89% Highest Density Interval (HDI) of the posterior distribution, the percentage of the posterior distribution within the region of practical equivalence (ROPE), and the probability of direction (pd) for each population-level (i.e., fixed) effect within each model. The HDI reveals the upper and lower parameter values of the posterior based on all values within the 89% range, these points therefore have a higher probability density than those outside the 89% range³⁷. The ROPE is the region that equates to the null hypothesis, although there are no set rules for defining this interval as it relates to the variables being analysed³⁷. In our case the response variable was a count and so we defined the area around 0 that equated to the ‘null’ as − 0.1 to 0.1, and this was additionally validated using the rope_range function from bayestestR³⁸. We therefore computed the proportion of the 89% HDI of the posterior distribution within the − 0.1 to 0.1 ROPE range. The pd variable is an index for inspecting whether each fixed effect has directionality (i.e., is positive or negative); pd always ranges from a minimum of 50% (i.e., equal distribution of positive and negative posterior values) to 100% (i.e., all posterior values are either positive or negative)³⁹. It has been shown that pd has a 1:1 correspondence with p-values calculated using frequentist methods³⁹.

We also calculated the conditional Bayesian R² estimates for each model using the loo_R2 function from the performance package ^40,41. In ordinary least squares (OLS) and maximum likelihood estimation regressions (MLE) the R² value is the variance of the predicted values divided by the variance of the data. In Bayesian analysis however, the variance of the predictions can be greater than the variance of the data. To address this issue, a Bayesian R² has been proposed⁴⁰, in which the R² estimate is the variance of the modelled predicted means divided by the sum of variance of the modelled predictive means and the modelled residual variance. The Bayesian R² value is therefore an estimate of the proportion of variance explained for future observations predicted using a given model, i.e., it is data-driven⁴⁰.

Implications to proximity association matrices and network analyses

Constructing association matrices collected at different observation distances

Symmetric, weighted (i.e., valued) dyadic association matrices were constructed for proximity data collected at different observation distances separately for each proximity buffer across both years, resulting in a total of 16 matrices. Matrices were symmetric because connections between individuals were derived from undirected scans of the proximity of neighbouring baboons to a focal animal¹⁴. We initially calculated the strength of connection between each dyad as the sum of how often each individual occurred as a neighbour of one another. The median (plus min and max) focal samples for each dyad were as follows: 2018 _{within 4.5m}: 26 (2–42), 2018 _{over 4.5m} 29 (2–51), 2019 _{within 4.5m} 27 (5–55), and 2019 _{over 4.5m}: 24 (8–50). As individuals were not sampled evenly across both observation distances (within/over 4.5 m), we converted this value to a proportion by dividing the connection strength by the sum of both individual’s (in each dyad) sampling effort within each observation distance. This gave each potential dyad exactly two connections of equal values (i.e., symmetrical): the proportion of their combined focal observations they occurred in proximity to one another for each observation distance. This was done separately for each proximity buffer within each year.

Comparing association matrices collected at different observation distances

To determine whether there were significant differences between association matrices collected when the observer was within 4.5 m versus beyond 4.5 m, we ran mantel tests between the matrices of each proximity buffer, within each year, using the mantel function from the vegan package⁴² with permutations set to 999 and Spearman’s rank as the correlation method. Mantel tests perform correlation tests between pairs of matrices and produce a standardised Mantel test statistic ranging from − 1 to 1 that infers the strength of the relationship between paired points from each weighted/directed matrix.

Investigating whether individuals exhibit similar local positions in networks produced from different observation distances

We used the R package igraph⁴³ to generate a network for each of the aforementioned matrices and to calculate the individual network metrics. We assessed each individual’s centrality on three levels: (1) degree—the total of an individual’s out- and in-degree connections, (2) closeness—the distance required for an individual to access the remaining individuals, and (3) betweenness—the number of shortest paths (between dyads) going through an individual. As proximity associations were considered undirected, we utilised only undirected metrics. As the shortest paths were weighted they were interpreted as distances⁴³. We used non-parametric Spearman’s rank correlations to investigate whether individuals held similar rank order positions for degree, closeness, and betweenness across networks produced from different observation distances. This was done separately for each proximity buffer within each year.