in

Cumulative cultural evolution and mechanisms for cultural selection in wild bird songs

Study population and song recordings

All animal procedures were carefully reviewed by the Williams College IACUC (WH-D), the Bowdoin College Research and Oversight Committee (2009–18), and the University of Guelph Animal Care Committee (08R601), and were carried out as specified by the Canadian Wildlife Service (banding permit 10789D).

We studied Savannah sparrows (Passerculus sandwichensis) at the Bowdoin Scientific Station on Kent Island, New Brunswick, Canada (44.5818°N, 66.7547°W). Since 1988, individuals nesting within a 10 ha study area in the middle of the island (30–70 pairs each year; part of a larger population of 350–500 males breeding on Kent Island and two adjacent islands) have been colour-banded to facilitate visual identification, and complete demographic information is available for birds on the study site (though not for the entire population) for the years 1989–2004 and 2009–2013. Because of strong natal and breeding philopatry51, birds hatched on the study site itself represent 40–80% of adult breeders in that area, and because of the systematic banding program, ages are known. Each year adds a new generation to the population, with yearlings making up approximately half of the adult breeding males. The birds banded and recorded on the study site are estimated to make up 10–20% of the Savannah sparrow population on Kent Island and two nearby islands.

Details of the recording methods used in this study (covering the years 1980, 1982, 1988-9, 1993-8, and 2003–13) can be found elsewhere36,49. Using digitally generated sound spectrograms (using SoundEdit Pro and Audacity), birds were scored as having either a) high note cluster=a final introductory segment interval including at least two different note types, or b) a click train=one or more introductory segment intervals including at least two clicks and no other note types, or c) both features36 (see Supplementary Fig. 1 for a full description of note types). Although a small proportion of birds (mean = 8.3%) did not include either feature in their songs (such birds either had no feature in the introductory segment intervals or one non-click note type in the final interval), we did not include this option in the model and omitted these birds from summaries of the data. We did not include data after the breeding year 2013 because of we began an experimental field tutoring study in the summer of 201364.

Modelling

We used a dynamic, discrete time model which allowed us to focus our analysis to specific time points within the year that are related to song learning (the beginning and end of the breeding season). These were: (1) the return of older birds between breeding seasons, (2) the recruitment of young birds singing newly crystallized songs in the spring, and (3) reproduction, resulting in the addition of juveniles during the summer breeding season.

Because survival data were not available for every year during the time span we studied, we captured the variation in survival rates observed in the field57 by using a binomial distribution centered on the average historical survival rate for each age class (addressing the possibility that cultural drift resulting from random differences in survival rates was responsible for the shift in song features). The model incorporates stochasticity to capture the variation in population dynamics and return rates by assigning parameter values for survival and return rates from empirically generated probability distributions.

We did not include spatial distribution of song variants in the model; although spatial patterns can be important for the dynamics of language loss58, territories with birds singing click trains and high note clusters were intermixed and no spatial structure was apparent (Fig. 3).

The model assumes that males choose which features to incorporate into the introductory sections of their songs during song development. Individuals fall into one of six mutually exclusive classes of male Savannah sparrows. The classes are defined by (1) the bird’s developmental stage in the song learning process: juvenile (J, the first year, when the song is plastic) or adult (A, after the first spring, when the song is crystallized), and (2) the variant or variants sung as part of the bird’s introduction (high note clusters, click trains, or both). Denoting note high note clusters with X and click trains with C, the adult classes are therefore AX, AC, and AXC, and the juvenile classes are JX, JC, and JXC. The sum of the individuals in these classes is the total male population.

We used two times during each year – late spring and late summer – to correspond to stages in song development (Fig. 5). At a given time t, when breeding is underway in the late spring, the male population consists entirely of adults singing crystallized song, and therefore each juvenile class is empty. At the end of the summer, the population of males has been augmented by juveniles, which are initially assigned to the same variant class as their fathers. To capture these dynamics, we define an intermediate time step, denoted ti. Time t + 1 then corresponds to the following breeding season (late spring), when juvenile males hatched the previous year have completed song development, crystallized their songs, and joined the adult class.

Fig. 5: Model of song development.

We used two age classes (J = juvenile and A = adult) and three classes of introductions (C = click trains, X = high note clusters, and  XC = both). In the late spring of a given year (time = t), only adult males are present. In late summer, those adults have bred and both they and juvenile males are present; at this intermediate time (ti) each male is initially allocated the same introduction type as his father (solid lines). Then, as song development progresses and juvenile males can be influenced by other tutors, they may retain their initial introduction type or switch to either of the other two types (dashed lines) before they crystallize their songs late in the following spring (time = t+1), and join the breeding cohort, which also includes adult males from the previous year who returned to breed again.

Full size image

In the late summer the male population increases with the addition of juveniles hatched that year, some of which will return to join the singing population the following year; survivors will return to breed within a few hundred meters of where they hatched51. To fit the observed historical decline in the Kent Island population57, the total number of returning juveniles, r (including both those hatched on site and those immigrating from nearby populations at time), follows a Poisson distribution where m = 33.6 – .182x and x is the number of years since 1980 (this function results in a decline of 5 males per decade; the initial number on the study site used in the model, 70, was extrapolated from historical data). The size of each returning juvenile class at time ti then takes the form:

$${{{{{{rm{JY}}}}}}}_{{{{{{{rm{t}}}}}}}^{{{{{{rm{i}}}}}}}} sim {{{{{rm{Poisson}}}}}}left(mright)frac{{{{{{rm{A}}}}}}{{{{{{rm{Y}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}}{{{{{{rm{A}}}}}}{{{{{{rm{X}}}}}}}_{{{{{{rm{t}}}}}}}+{{{{{rm{A}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{rm{t}}}}}}}+{{{{{rm{AX}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{rm{t}}}}}}}}$$

(1)

for each Y {X, C, XC}.

After the following winter, the proportion of surviving adults at time t + 1 follows a binomial distribution where the mean survival rate s = 0.48 is derived from historical data. Therefore, each adult class takes the form:

$${{{{{rm{A}}}}}}{{{{{{rm{Y}}}}}}}_{{{{{{rm{t}}}}}}+1} sim {{{{{rm{Binomial}}}}}}left({{{{{rm{AY}}}}}},{{{{{rm{s}}}}}}right)* {{{{{rm{A}}}}}}{{{{{{rm{Y}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}$$

(2)

At the beginning of the next breeding season, juveniles complete song learning64, choosing which variant to crystallize as part of the song, and enter an adult song class; thus all of the juvenile classes disappear at t + 1. Which adult class juveniles join depends on separate learning functions for each of the two variants, ϕX for the high note cluster and ϕC for the click train. The ϕ function takes values between 0 and 1 and gives the probability of crystallizing a song form during the transition from natal year to breeding, depending upon the frequency-dependent bias and selection parameters (see below). These functions define the proportion of features that appear in the next generation as compared to that of the previous generation. Therefore we have:

$${{{{{rm{A}}}}}}{{{{{{rm{X}}}}}}}_{{{{{{rm{t}}}}}}+1}={left({{{upphi }}}_{{{{{{rm{X}}}}}}}right)}^{2}{{{{{rm{J}}}}}}{{{{{{rm{X}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}+{left(1-{{{upphi }}}_{{{{{{rm{C}}}}}}}right)}^{2}{{{{{rm{J}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}+{{{upphi }}}_{{{{{{rm{X}}}}}}}left(1-{{{upphi }}}_{{{{{{rm{C}}}}}}}right){{{{{rm{JX}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}+{{{{{rm{A}}}}}}{{{{{{rm{X}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}$$

(3)

$${{{{{rm{A}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{rm{t}}}}}}+1}={left(1-{{{upphi }}}_{{{{{{rm{X}}}}}}}right)}^{2}{{{{{rm{J}}}}}}{{{{{{rm{X}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}+{left({{{upphi }}}_{{{{{{rm{C}}}}}}}right)}^{2}{{{{{rm{J}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}+left(1-{{{upphi }}}_{{{{{{rm{X}}}}}}}right){{{upphi }}}_{{{{{{rm{C}}}}}}}{{{{{rm{JX}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}+{{{{{rm{A}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}$$

(4)

$${{{{{rm{A}}}}}}{{{{{{rm{XC}}}}}}}_{{{{{{rm{t}}}}}}+1}=2{{{upphi }}}_{{{{{{rm{X}}}}}}}left(1-{{{upphi }}}_{{{{{{rm{X}}}}}}}right){{{{{rm{J}}}}}}{{{{{{rm{X}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}+2{{{upphi }}}_{{{{{{rm{C}}}}}}}left(1-{{{upphi }}}_{{{{{{rm{C}}}}}}}right){{{{{rm{J}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}+({{{upphi }}}_{{{{{{rm{X}}}}}}}{{{upphi }}}_{{{{{{rm{C}}}}}}}left(1-{{{upphi }}}_{{{{{{rm{X}}}}}}}right)left(1-{{{upphi }}}_{{{{{{rm{C}}}}}}}right){{{{{rm{JX}}}}}}{{{{{{rm{C}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}})+{{{{{rm{A}}}}}}{{{{{{rm{XC}}}}}}}_{{{{{{{rm{t}}}}}}}_{{{{{{rm{i}}}}}}}}$$

(5)

The sum of probabilities defining all of song crystallization outcomes for the songs of fathers with song type X is:

$${left({{{upphi }}}_{{{{{{rm{X}}}}}}}right)}^{2}+{left(1-{{{upphi }}}_{{{{{{rm{X}}}}}}}right)}^{2}+2{{{upphi }}}_{{{{{{rm{X}}}}}}}left(1-{{{upphi }}}_{{{{{{rm{X}}}}}}}right)=1$$

(6)

Learning curves

To define how young males’ song learning is influenced by the songs they hear, we used learning curves based on type III Holling response curves59 which provide a means to numerically capture functional responses. In our model, the type III curve models the response of juvenile to the song form of adults in the population based on two variables: (1) frequency-dependent bias that favors one form based on its prevalence within the adult population, and (2) selection that favors a particular form of the song.

The learning curves, ϕx for the high note cluster and ϕc for the click train, are modified forms of the type III Holling response curve):

$${{{upphi }}}_{{{{{{rm{x}}}}}}}=frac{{x}^{{{{{{rm{beta }}}}}}}/{{{{{rm{sigma }}}}}}}{{(1-x)}^{{{{{{rm{beta }}}}}}}+({x}^{{{{{{rm{beta }}}}}}}/{{{{{rm{sigma }}}}}})}$$

(7)

and

$${{{upphi }}}_{{{{{{rm{c}}}}}}}=frac{{{{{{rm{sigma }}}}}},{c}^{{{{{{rm{beta }}}}}}}}{{(1-c)}^{{{{{{rm{beta }}}}}}}+{{{{{rm{sigma }}}}}}{{c}}^{{{{{{rm{beta }}}}}}}}$$

(8)

where x is the proportion of the high note cluster within the population, c is the proportion of the click train within the population, β is frequency-dependent bias (favoring learning the novel or retaining the common variant), and σ is selection on the novel variant (a preference for learning the variant that is not dependent on frequency of the variant and includes factors such as prestige bias, success bias, status, and content bias). Note that the two learning curves do not have identical equations, because selection is not frequency-dependent. In these equations, β > 1 corresponds to conformist selection, and when β < 1 the rare form is favored. Values of σ > 1 correspond to selection for a novel variant and values of σ < 1 correspond to selection against a novel variant. The parameters β and σ allow us to test the relative roles of frequency-dependent bias and cultural selection, as well as various combinations of the two by using a single function giving the probability that social learning will result in a juvenile male crystallizing a particular song variant.

Males that sang both high note clusters and click trains (the AXC class) could be interpreted in one of two ways within this framework:

Two-trait: by counting each variant individually, so that a bird singing both variants is counted twice in calculations of variant frequencies (once for high note clusters, and once for click trains), while a bird singing one form is counted only once. In this scenario, frequencies were calculated as (time subscripts omitted for clarity):

$${{{{{{rm{P}}}}}}}_{{{{{{rm{C}}}}}}}=frac{{{{{{rm{AC}}}}}}+{{{{{rm{AXC}}}}}}}{{{{{{rm{AC}}}}}}+{{{{{rm{AX}}}}}}+2{{{{{rm{AXC}}}}}}}$$

(9)

and

$${{{{{{rm{P}}}}}}}_{{{{{{rm{X}}}}}}}=frac{{{{{{rm{AX}}}}}}+{{{{{rm{AXC}}}}}}}{{{{{{rm{AC}}}}}}+{{{{{rm{AX}}}}}}+2{{{{{rm{AXC}}}}}}}$$

(10)

Blended trait: each bird was counted once (birds that sang a single variant were weighted twice as much as those that sang both traits). In this scenario, frequencies were calculated as:

$${{{{{{rm{P}}}}}}}_{{{{{{rm{C}}}}}}}=frac{2{{{{{rm{AC}}}}}}+{{{{{rm{AXC}}}}}}}{2({{{{{rm{AC}}}}}}+{{{{{rm{AX}}}}}}+{{{{{rm{AXC}}}}}})}$$

(11)

and

$${{{{{{rm{P}}}}}}}_{{{{{{rm{X}}}}}}}=frac{2{{{{{rm{AX}}}}}}+{{{{{rm{AXC}}}}}}}{2({{{{{rm{AC}}}}}}+{{{{{rm{AX}}}}}}+{{{{{rm{AXC}}}}}})}$$

(12)

Innovations

As most males singing click trains in the 1980s and early 1990s also sang a high note cluster, we assumed that the innovators’ songs included both forms. We know that click trains first appeared in the population between 1983 and 1987, as they were absent in 1982 recordings and present in 1988 recordings. Prior to 1983, all adults sang high note clusters and so belonged to the AX class. We modeled the appearance of click trains in the population with the term in, which represented the number of innovators (which we modeled as entering the population in class AXC, see the next section), and was added in any year from 1983 to 1987. To maintain populations at consistent levels, we subtracted the number of innovators from the AX class in the year the innovation was introduced.

Choice of values for innovators and years

First, we assumed that interstitial notes, whether high note clusters, click trains, or both, represented a single trait. We tested this assumption by running the model with either (1) the blended trait or (2) treating click trains and high note clusters as two distinct traits (see Supplementary Table 2 and Supplementary Fig. 2); the blended trait model fit the data better.

We know from the corpus of recordings that click trains were not observed in 1980 or 1982, when high note clusters were the prevalent form. Click trains were first recorded in 1988. Because we do not have recordings for the period spanning 1983 to 1987, each of these years is potentially the time of the initial introduction. We used the earliest possible year, 1983, as the default, because we observed potential precursors of the click train in 1982 songs. We also modeled the appearance of initial innovations for the years 1984 through 1987 (Supplementary Table 3 and Supplementary Fig. 3).

The number of innovators (individuals that sang the click train in the first year it appeared on the study site) is unknown. We chose a default value of 2 males (2.9% of the study population of 70) for two reasons. First, innovations we have observed in other segments of Savannah sparrow songs initially appeared in the songs of 2 or 3 individuals. Second, this “mutation rate”, µ = 0.029 per song per year, is in the range found in previous work on the introduction of innovations in learned songs: 0.001 to 0.035 per year in U.K. chaffinches85, and ~ 0.057 in New Zealand chaffinches86 This value is also in the middle of the range used to model human cultural evolution (0.004 to 0.128)87. We varied the number of innovators from 1 to 8 (µ = 0.014 to µ = 0114) to assess the effect of this parameter on the model’s results (see Supplementary Table 4 and Supplementary Fig. 4).

Our models thus used, as default values, two innovators, appearing in 1983, that sang both click trains and high note clusters as a blended trait, and we tested the effects on the modeling results by varying these default values.

Implementation and evaluation

The model was implemented in the R88 package POMP89 (Partially Observed Markov Processes), using embedded C code. We performed a grid search over a range of the parameters σ and β (from 0.5 to 2.0 in 0.05 steps for each parameter if not otherwise stated) and calculated the estimated the log likelihood for each parameter combination. We used an initial burn-in of 50 years prior to the first year for which we compared the model to existing data (1980). We repeated this analysis for each set of initial conditions (year the innovation was introduced, and blended vs. two-trait categorization for birds that sang both high note clusters and click trains). We visualized the model space with heat map plots prepared using MatLab, and identified the maximum likelihood estimate (MLE) and the corresponding 95% confidence intervals. Using the best fit parameters (those that corresponded to the MLE), we then ran the model again 50 times to generate average and 95% CI trajectories for frequencies of song variants and plotted them in the same manner as the observed field data.

Song playback study

We tested the responses of Savannah sparrows on their territories in early July of 2011 (when most pairs were feeding young or beginning a second clutch) to song segments with click trains that included different numbers of clicks. None of the songs of 39 birds recorded on the study site in 2011 included high note clusters. The mean number of clicks within click trains was 3.93, ranging from 0 (3 birds) to 7 (3 birds), with a mode of 4 clicks in a train (n = 16). All of the subjects of the playback study would have had the opportunity to hear click trains ranging from 0 to 7 clicks, but would not have been familiar with high note clusters. Because comparisons of responses to songs with click trains and high note clusters would have been confounded by the issue of familiarity, we only tested subjects’ responses to the number of clicks in a train. (A test of the efficacy of click trains and high note clusters in hand-reared birds that had not been exposed to either form might address the question of how preferences may be shaped by social learning).

The stimuli were constructed from high-quality recordings of introductory sections from the songs of 12 different males to produce different 12 stimulus sets, to avoid pseudoreplication. The introductory sections of the twelve songs were originally composed of 5–8 introductory notes, between which were 1–3 click trains that included 3–7 clicks. Each of these introductory segments was extracted and then digitally altered (using Audacity, audacityteam.org) to produce a set of four different stimuli that included 0, 2, 4, or 7 clicks in each click train. The introductory notes, the temporal spacing of the introductory notes and the length of the entire introductory segment was the same for each stimulus within a set. Clicks were added to a train by duplicating existing clicks and adjusting them to be evenly spaced within the interval between introductory notes. Clicks were removed by replacing clicks at the end of a train with silence. Since introductory notes are substantially longer (mean = 67 ms) than clicks (mean = 2 ms), a change of one click in a click train stimulus represented a change of, on average, 0.91% in the signal duration (taking into account that adding one click to a train meant adding one click to all instances of that train within a stimulus). Introductory notes are also substantially louder than clicks, and so the overall change in the sound intensity within different stimuli was very small. To the human ear, longer click trains make the intervals between the louder, longer introductory notes sound somewhat “raspier” than shorter click trains, but the difference is subtle.

Each of 25 male subjects was tested with all four stimuli from one set. Each trial started with a “primer”, a stimulus consisting of introductory notes without interstitial notes55. Two minutes after the bird’s response ended, the first test stimulus was presented for two minutes (at 12 second intervals). The next stimuli were presented in succession, with a delay of two minutes after the bird’s response ended for each stimulus. Stimuli were presented in a randomized order, and each stimulus set was used at least twice. The response duration and behaviours of males (crouching with head feathers flattened close to the skull, aggressive displays48 and vocalizations90) were noted. We used duration, measured as time from the end of the stimulus presentation until the male ceased responding (defined as moving 20 m or more away from the speaker, or singing a full and loud song, or engaging in feeding or preening behaviour), as our primary measure of male response55. Because the strength of the response varied across birds, we normalized response durations for each individual bird in Fig. 4c. To correct for a rightward skew in the distribution, we log-transformed the raw response duration measure and assessed the relationship between response duration and number of clicks (F1,73 = 10.97, P < 0.005), using a generalized mixed-effects model implemented with the lme4 package91 in R which included the identity of the subject (F24,73 = 3.84, P < 0.000001) as well as the trial order (F1,73 = 0.012, P > 0.9) as random effects. We did not record songs produced during stimulus playback; we observed an average of 0.6 songs per trial, which would not have provided a large enough sample size for analysis.

Females did not always respond to the playback stimuli. When they did respond (in 11 of 25 trials) their responses differed from those of males: females typically stood erect rather than crouching, elevated their crest feathers instead of flattening them, and were never observed to give aggressive wing flutters or vocalizations but rather hopped towards the speaker while peering about alertly. Because female responses to other song stimuli presented in previous studies used the postures and behaviours typical of male aggressive responses, we interpret the approach with an erect posture and crest as having a different valence: investigative/approach rather than aggressive. We noted both which stimuli the females approached and which stimulus they first approached and evaluated the effects of click number with a Chi-squared test.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.


Source: Ecology - nature.com

Expanding the phylogenetic distribution of cytochrome b-containing methanogenic archaea sheds light on the evolution of methanogenesis

Pursuing progress at the nanoscale