Mothers with higher twinning propensity had lower fertility in pre-industrial Europe

Data preparation

Historical data

The primary source of data is historical parish registers, which have been transcribed under the supervision of many of the study authors over a number of decades, primarily for evolutionary demographic research. Our dataset (Supplementary Data 1) includes nine European populations, including some for which the positive relationship between maternal lifetime twinning status and maternal total births has been described previously^5,6,8,10. Details for the populations used in this study are given in Table 1 and in Supplementary Table 14. The sourcing of each dataset and the socio-ecological background of each population have already been described in previous studies (see Table 1 for references). Overall, there is no reason to suspect a high level of consanguinity in these populations⁶², so our analyses do not account for the variable level of relatedness between individuals. The datasets cover pre-industrial periods in which the lifetime reproductive success was high and the majority of people were living and working in agrarian communities, except for the Samis (from northern Finland and Sweden) who made their living fully or in part from a combination of herding, fishing and hunting. The smooth decline of the probability of parity progression with parity (Fig. 4a) suggests that mothers did not effectively limit their reproductive success with the aim of achieving a small family size, as found in populations that have undergone a demographic transition.

Data selection

We use the term family to describe a mother and all individuals to whom she gave birth over her life. For our analyses, all families considered met the following criteria: the mother’s age was known at a monthly resolution and her life course traced until at least age 45 (approximating full reproductive life), the birth year and month of all offspring must have been recorded and consecutive births were all at least nine months apart from one another. In the case of one population (Norway) and of a few observations in the other populations, the month of birth was not available. These data were thus not considered in the results presented in main text because some of our analyses require an accurate estimation of the interbirth interval. Most analyses are thus based on data from eight populations. Nevertheless, the slope of the negative relationship between twinning and total birth remained very similar irrespectively of whether or not such data were included (Supplementary Fig. 5), which suggests that the exclusion of Norwegian data does not alter our main conclusions. Information on the populations considering also the data for which the birth months were missing is provided in Supplementary Table 14.

Twin identification

In our data, the maximum number of offspring to constitute a multiple birth was three. We use the term twin(s) to refer to offspring who were the result of the same multiple birth (including 1745 sets of twins sensu stricto and 19 sets of triplets in the filtered dataset and, respectively, 1915 and 20 in the dataset including observations that lack birth months). Although twins are sometimes explicitly indicated in the data sources, this is not always the case. Thus, for the sake of consistency across our populations, twin births were identified when at least two individuals born to the same mother appeared with similar birth dates, according to strict criteria: if the exact birth dates were available, then offspring were identified as twins if their birth dates were no more than one day apart. If the exact birth dates were not available, then an identical birth year and month were considered sufficient for positive twin identification.

Analyses and simulations

Characterisation of the relationship between twinning and fertility

We began by characterising the relationship between lifetime twinning status and maternal total births by fitting two models. For the first, we used a Generalised Linear Mixed-effects Model (GLMM) to investigate whether the mothers of twins (twinners) had experienced a larger or smaller number of births than mothers who only had singletons (non-twinners). We fitted this GLMM on the mother-level data with the R package spaMM⁶³ using the call:

$$begin{array}{c}{{{rm{fitme}}}}left({{{{rm{births}}}}}_{{{{rm{total}}}}}sim 1+{{{rm{twinner}}}}+(1|{{{rm{pop}}}}),right. {{{rm{data}}}}={{{rm{mother}}}}_{{{rm{level}}}}_{{{rm{data}}}}, left.{{{rm{family}}}}={{{rm{Tnegbin}}}}({{{rm{link}}}}={hbox{“}}{{{mathrm{log}}}}{hbox{”}})right)end{array}$$

(1)

The response variable births_total refers to all births recorded over a mother’s observed lifetime (count data). The term 1 informs the function to fit an intercept (which happens by default, but is indicated here for clarity). The predictor twinner refers to maternal lifetime twinning status (binary: twinner vs non-twinner) and is modelled as a fixed effect. The term pop refers to the population identity (qualitative variable with eight levels) and is modelled by a Gaussian random effect acting on the intercept, which allows for the modelling of the heterogeneity between populations that is not captured by the fixed effects. The argument family is used to define the error structure and the link function of the GLMM (more on this below).

In a second model, we reversed which variable is used as a response and which is used as the fixed-effect predictor. This allowed us to analyse how maternal total births predicted the probability of a mother producing twins during her lifetime using the call:

$$begin{array}{c}{{{{{rm{fitme}}}}}}left({{{{{rm{twinner}}}}}} sim 1+{{{{{rm{births}}}}}}_{{{{{rm{total}}}}}}+(1|{{{{{rm{pop}}}}}}),right. {{{{{rm{data}}}}}}={{{{{rm{mother}}}}}}_{{{{{rm{level}}}}}}_{{{{{rm{data}}}}}}, left.{{{{{rm{family}}}}}}={{{{{rm{binomial}}}}}}({{{{{rm{link}}}}}}={hbox{“}}{{{{{rm{logit}}}}}}{hbox{”}})right)end{array}$$

(2)

The fitted models 1 and 2 are depicted in Fig. 1a, b and Supplementary Tables 1,2. While models 1 and 2 represent two sides of the same coin, the fit of both models is justified because each model formulation provides complementary information: expressing the effect of twinning on fertility relates to previous work^{5,6,7,8,9,10,57} and expressing the effect of fertility on twinning is a first step toward identifying what shapes twinning propensity, the focus of this paper.

For the model predicting total births (model 1), we chose to use a negative binomial error structure. Using this error structure produced a fit of the data that was better than a (truncated) Poisson—the usual alternative for count data—as evidenced by much smaller marginal and conditional AIC values⁶⁴. Here we specifically used a truncated negative binomial distribution because the data do not possess zeros by construction (only mothers are present in the dataset, i.e. there are no nulliparous women). For the model predicting lifetime twinning status (model 2), we chose a binomial error structure which is appropriate for binary data.

Modelling the proportion of twin births among all births per mother is an effective way to avoid biases caused by differences in exposure to the risk of having twins affecting the relationship between twinning and fertility. For this, we fitted the following third model:

$$begin{array}{l}{{{{{rm{fitme}}}}}}left({{{{{rm{cbind}}}}}};({{{{{rm{twin}}}}}}_{{{{{rm{total}}}}}},{{{{{rm{singleton}}}}}}_{{{{{rm{total}}}}}})right. sim ,1+{{{{{rm{births}}}}}}_{{{{{rm{total}}}}}}+(1|{{{{{rm{pop}}}}}}), ,{{{{{rm{data}}}}}}={{{{{rm{mother}}}}}}_{{{{{rm{level}}}}}}_{{{{{rm{data}}}}}}, ,left.{{{{{rm{family}}}}}}={{{{{rm{binomial}}}}}};({{{{{rm{link}}}}}}={hbox{“}}{{{{{rm{logit}}}}}}{hbox{”}})right)end{array}$$

(3)

In this model, the variable twin_total refers to the mother’s total number of twin births (i.e. one for each twinning event), singleton_total refers to the lifetime number of singleton births, and the cbind() function serves to indicate the fitting function to model the frequency of twinning events based on these two variables, which is interpreted as number of successes and failures of a binomial experience. The fitted model 3 is depicted in Fig. 2 and Supplementary Table 3.

We modified model 3 so as to test whether the effect of total births differed significantly between populations. To do so, we considered that the effect of populations on total births could either be modelled as an interaction between fixed effects or as a random slope. For the former representation, we thus compared the fit of a model with linear predictor structure defined in spaMM as 1+births_total*pop to that of a model with the structure 1+births_total+pop. For the latter representation, we compared the fit with linear predictor structure 1+births_total + (1|pop) (i.e. model 3 as introduced above) to that of 1+births_total + (1+births_total|pop). We performed this testing procedure by comparing the likelihood ratio between each pair of alternative fits to the expectation of such a ratio under the null hypothesis. The distribution of the statistics used for the test was computed using 1000 parametric bootstrap replicates, which we generated using the function anova() provided by spaMM⁶³. The test revealed a small non-significant variation in slopes between populations (see Results). For the sake of simplicity, we thus considered the effect of births_total the same across populations in all other analyses.

Modelling life-history events using GLMMs

To reveal the biological mechanisms responsible for the relationship between twinning and fertility, we first fitted statistical models describing how age, parity and twin/singleton status, as well as individual and population differences influenced three key life-history events: parity progression (PP), the duration of interbirth intervals (IBI) and the twinning outcome of births (T). These models were fitted on birth-level data by the following calls:

$$begin{array}{l}{{{{{rm{fitme}}}}}}left({{{{{rm{PP}}}}}} sim 1+{{{{{rm{twin}}}}}}+{{{{{rm{poly}}}}}}({{{{{rm{cbind}}}}}}({{{{{rm{age}}}}}},,{{{{{rm{parity}}}}}}),,{{{{{rm{best}}}}}}_{{{{{rm{order}}}}}})right. ,+,(1|{{{{{rm{maternal}}}}}}_{{{{{rm{id}}}}}})+(1|{{{{{rm{pop}}}}}}), ,{{{{{rm{data}}}}}}={{{{{rm{birth}}}}}}_{{{{{rm{level}}}}}}_{{{{{rm{data}}}}}}, ,{{{{{rm{family}}}}}}={{{{{rm{binomial}}}}}};({{{{{rm{link}}}}}}={hbox{“}}{{{{{rm{logit}}}}}}{hbox{”}})end{array}$$

(4)

$$begin{array}{l}{{{{{rm{fitme}}}}}}left({{{{{rm{IBI}}}}}} sim 1+{{{{{rm{twin}}}}}}+{{{{{rm{poly}}}}}};({{{{{rm{cbind}}}}}}({{{{{rm{age}}}}}},,{{{{{rm{parity}}}}}}),{{{{{rm{best}}}}}}_{{{{{rm{order}}}}}})right. ,+,(1|{{{{{rm{maternal}}}}}}_{{{{{rm{id}}}}}})+(1|{{{{{rm{pop}}}}}}), ,{{{{{rm{data}}}}}}={{{{{rm{birth}}}}}}_{{{{{rm{level}}}}}}_{{{{{rm{data}}}}}}, ,left.{{{{{rm{family}}}}}}={{{{{rm{negbin}}}}}}({{{{{rm{link}}}}}}={hbox{“}}{rm log} {hbox{”}})right)end{array}$$

(5)

$$begin{array}{l}{{{{{rm{fitme}}}}}}left({{{{{rm{T}}}}}} sim 1+{{{{{rm{poly}}}}}}({{{{{rm{cbind}}}}}}({{{{{rm{age}}}}}},,{{{{{rm{parity}}}}}}),,{{{{{rm{best}}}}}}_{{{{{rm{order}}}}}})right. ,+,(1|{{{{{rm{maternal}}}}}}_{{{{{rm{id}}}}}})+(1|{{{{{rm{pop}}}}}}), ,{{{{{rm{data}}}}}}={{{{{rm{birth}}}}}}_{{{{{rm{level}}}}}}_{{{{{rm{data}}}}}}, ,left.{{{{{rm{family}}}}}}={{{{{rm{binomial}}}}}};({{{{{rm{link}}}}}}={hbox{“}}{{{{{rm{logit}}}}}}{hbox{”}})right).end{array}$$

(6)

The response variables of models 4, 5 and 6 are thus PP, IBI and T, which refer to whether the mother went on to reproduce again or not (a boolean), the duration of the interbirth interval between the focal birth and the next (a discrete number of months) and whether the birth resulted in twins or not (a boolean), respectively. In addition to the terms that have already been defined, we now have the term poly(cbind(age, parity), best_order) to code for a polynomial describing the effect of maternal age, parity and their possible interaction. The two-variable polynomial function was applied on maternal age (with a monthly resolution) and parity (i.e. the current birth rank). Such a polynomial term allowed us to explore the influences of maternal age and parity on each response variable while encompassing the non-linearity of these predictors. We also have the predictor variable twin, which is a boolean that indicates if the previous birth event of a given mother resulted in twins or not (the variable twin and T are the same, but we used two different names to clarify when it is used as a response or as a predictor). Finally, we have the random effect “maternal identity” (maternal_id), which is used to represent intrinsic variation among mothers, that is, heterogeneity of expected response among individuals, beyond that due to the fixed effects and the population random effect. This random effect therefore measures maternal intrinsic fertility (in models 4 & 5) and twinning propensity (in model 6).

To determine the best polynomial order (best_order) for the polynomial term we attempted orders from 0 to 6 and selected, for each model, the order leading to the model fit associated with the smallest marginal AIC. A polynomial of order 6 is sufficient to fit very complex shapes. Polynomial orders obtained by this procedure are given in the summary tables of the model fits given in Supplementary Tables. Importantly, maternal age and parity are highly correlated together (Spearman’s rho = 0.69), unequally correlated to response variables and exert non-linear effects. These are precisely the conditions in which collinearity issues are the most severe⁶⁵. This justifies why we considered them jointly in all statistical models, as well as why we did not attempt to partition their respective biological effects in our analyses (except for the visual representation in Fig. 4).

In order to study how the lifetime twinning status influenced maternal age at first birth, we also fitted the following model:

$$begin{array}{l}{{{{{rm{fitme}}}}}}left({{{{{rm{AFB}}}}}} sim 1+{{{{{rm{twinner}}}}}}* {{{{{rm{births}}}}}}_{{{{{rm{total}}}}}}_{{{{{rm{fac}}}}}}+(1|{{{{{rm{pop}}}}}}),right. ,{{{{{rm{data}}}}}}={{{{{rm{mother}}}}}}_{{{{{rm{level}}}}}}_{{{{{rm{data}}}}}}, ,left.{{{{{rm{family}}}}}}={{{{{rm{negbin}}}}}}({{{{{rm{link}}}}}}={hbox{“}}{rm log} {hbox{”}})right)end{array}$$

(7)

In this model, the response variable AFB corresponds to the age at first birth expressed as a number of months (discrete data) and the predictor births_total_fac corresponds to a qualitative variable referring to maternal total births (10 levels: 1, 2, …, 9, 10 + ). We here considered a possible interaction between twinner and births_total_fac. We used the negative binomial family as in model 1 but as for model 5 there is no need to consider here the truncated form of the distribution. All other terms have already been defined. The fitted model is depicted in Supplementary Fig. 1 and Supplementary Table 7.

Marginal predictions for GLMMs

All predictions shown in plots or given in text represent marginal predictions. This means that the predictions for the quantities of interest (maternal lifetime births, twinning probabilities and age at first birth) are a function of coefficients of the fixed effects, and of the variance of the random effects. To be more precise, we averaged, over the fitted distribution of random effects, the predictions expressed on the scale of the response (i.e. back-transformed from the scale of the linear predictor) and conditional on the fixed and random effects. Unlike the traditional conditional predictions computed for a specific value of the random effects (often 0), such computation provides unbiased predictions and should be favoured in the context of GLMMs where random effects act non-additively on the expected response (which is the case when the link function of the model is not identity⁶⁶). We estimated 95% intervals for these marginal predictions (CI_95%) using parametric bootstraps with the help of the function spaMM_boot() from the R package spaMM and boot.ci() from the R package boot. More details can be found by looking at the code of the functions compute_predictions() and compare_prediction() in our supporting R package twinR (see Code availability).

Simulating the life history of mothers

We produced an individual-based simulation model of human female life history to investigate the contribution of four mechanisms to the relationship, shown in Fig. 2, between per-birth twinning probability and maternal total births—an approach generally known as pattern-oriented modelling⁶⁷. Each simulation proceeds in the following way: first, we initialised the simulation with representations of the exact same mothers present in the observed dataset, setting their population and maternal identities as the real ones, their starting ages at the observed values for age at first birth and their parity to one. Following this initialisation, the virtual lives of mothers proceeded as multiple iterations of a sequence of three life-history events, informed by statistical models (see below) and subject to the hypothesis being tested (Supplementary Figs. 3, 4). Specifically, for each mother, the twin/singleton status (T) of the current birth was first determined using a GLMM predicting T. Then, whether or not she will go on to reproduce at least once more was determined by simulating her parity progression status (PP) using a GLMM predicting the parity progression probability. For mothers who do continue reproducing, we finally used a third GLMM to determine the length of the interval between the current birth and the next one (IBI). At each iteration, a mother’s parity is increased by one, and age is increased by the simulated length of the interbirth interval. All predictions were performed conditionally on the value for the predictor characterised by both fixed and random effects. The process of simulating PP, IBI and T was then reiterated until all women had ceased reproduction, which happens necessarily since the probability of parity progression is lower than one. We also set this probability to zero once women reached 60 years old to save computation time in particular simulation scenarios leading to unrealistic life histories (and bad goodness of fit). Note that the maximum recorded age at which a mother gave birth was 55.1 years in our data. For the same reason, we also capped the maximum simulated duration for the interbirth interval to 30 years.

Drawing life-history events from the fit of the model formulas shown above for models 4, 5 and 6 corresponds to simulating the scenario PISH (i.e. all four hypothetical mechanisms are activated). For simulating other simulation scenarios, we had to fit additional GLMMs derived from models presented above. Specifically, the term twin was dropped from model 4 to deactivate mechanism P (model 8; Supplementary Table 8); the term twin was dropped from model 5 to deactivate mechanism I (model 9; Supplementary Table 9); the term poly(cbind(age, parity), best_order) was dropped from model 6 to deactivate the mechanism S (model 10 and 11; Supplementary Tables 10, 11); and the term (1|maternal_id) was dropped from model 6 to deactivate mechanism H (model 11 and 12; Supplementary Tables 11, 12).

Testing candidate mechanisms using simulations

To test how each mechanism or association of mechanisms influenced the relationship between twinning and fertility, we ran simulations under each possible set of activated or inactivated mechanisms. We tested all possible sets and we thus built a total of 4² = 16 simulation scenarios (Supplementary Figs. 3, 4).

For each simulation scenario, we ran simulation replicates (see Supplementary Notes for details and information on the numbers of replicates), then fitted model 3 on the dataset produced by each replicate and extracted the estimate for the slope associated with the term births_total in that model (β*). We then consider, in turn, that each simulation scenario may have generated the data. Each simulation scenario is thus considered as a null hypothesis which we aim at testing. Such a test is traditionally referred to as a goodness-of-fit test⁶⁸. The result of such a test, a p-value, answers the question: what is the probability of obtaining a value equal to, or more extreme than, the statistic of interest, if the null hypothesis were true? The rejection of the null hypothesis by the test (i.e. a p-value ≤ 0.05) signifies a rejection of the null hypothesis, and thus, here, the rejection of a simulation scenario which represents a particular mechanism, or combination of mechanisms. In contrast, a large (i.e. non-significant) p-value would here denote support for the simulation scenario under consideration.

A first candidate, as a statistic of interest to build our goodness-of-fit test, is the slope β*. However, when viewed as a goodness-of-fit test, the direct comparison of the observed and simulated slopes may be conservative when other life-history parameters are fitted to the data. This is because the data tend to be more likely given parameter values fitted to the data than given the actual (unknown) parameter values that generated the data. The goodness-of-fit test is however only guaranteed to provide uniformly distributed p-values (a feature necessary for the correctness of any null hypothesis testing) when samples are drawn under the latter parameter values. This is a general issue in statistics which has also been discussed long ago, for example, when the data-generating process is the normal distribution and a Kolmogorov–Smirnov test of goodness-of-fit is applied⁶⁹. We thus designed and validated a specific procedure to correct for such bias while testing each simulation scenario (Supplementary Notes). In the text, we only report outcomes from this unbiased goodness-of-fit test (for details, see Supplementary Notes and Supplementary Table 13).

Studying the effect of twinning propensity on the number of offspring using simulations

To study how twinning propensity influences the total number of offspring that mothers produced during their lifetime, we ran two sets of simulations, each with 100 replicates. In the first set, we ran the simulation as described in the section “Simulating the life history of mothers” using the fits of the models associated with the simulation scenario PIS (i.e. fits of models 4, 5 and 12). In the second set, we did the same, except that we modified the intercept of the model predicting twinning events (fit of the model 12) by adding 2.5 to its intercept. We also tried other values, some smaller (e.g. 0.25), some larger (e.g. 5), to make sure that the magnitudes of the change of the intercept did not impact our qualitative statements. For each set of replicates, we extracted the twinning rate, the twinner rate, the mean number of offspring produced and the mean total number of births. We report the means of these metrics, as well as the 95% Central Range from simulation replicates (CR_95%), which we directly computed by extracting the corresponding quantiles from the distribution generated by the replicates.

Realism of the simulations

We checked that the simulated life history closely matched that of the real mothers represented in our dataset beyond what is captured by the relationship between twinning and fertility. To do so, we compared different metrics related to fertility and twinning between the real and simulated data. We chose to perform this comparison under the simulation scenario PIS since it produces the best goodness-of-fit. The results of this quality check confirm that our simulations represent the reproductive lives of the mothers appropriately (Supplementary Fig. 6).

Studying the effect of mortality on the number of offspring using simulations

To account for the fact that not all offspring have the same expected survival, we also applied a survival weight to each simulated offspring before averaging the numbers for a given simulation set (baseline twinning propensity or enhanced, see Results). We used as weights the estimates for the probability of offspring survival between birth and adulthood provided by two publications associated with some of the data we used there. Specifically, following Helle et al.⁸, we used a weight of 0.603 for twins, 0.838 for singletons from twinners and 0.815 for singletons from non-twinners. Alternatively, following Haukioja et al.³, we used a weight of 0.337 for twins and 0.706 for singletons from all mothers.

Implementation details

All statistical analyses were performed in R version 4.1⁷⁰. The main R packages we used were spaMM⁶³ version 3.9.40 for the fit of all the statistical models, boot^71,72 version 1.3-28 for the computation of confidence intervals based on parametric bootstraps, and R6⁷³ version 2.5.1 for defining the object used to run the simulations. The DESCRIPTION file from our package twinR (see Code availability) lists the additional R packages required for this project (e.g. those used for plotting and data manipulation).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.