Stochastic models support rapid peopling of Late Pleistocene Sahul

Cellular-automaton framework

We constructed the cellular-automaton model in the open-access R statistical computing environment (cran.rproject.org). We provide all code, data and instructions to repeat the analysis⁶⁵, which can be run on any desktop computer. Our spatial model is based on a 0.5° × 0.5° raster grid of Sahul from 0.5 to 43.0° S latitude, and 110.5 to 153.5° E longitude (86 rows and 87 columns). The land area of Sahul changes with fluctuating sea levels, so we estimated exposed land in 1000-year time slices to follow our available hindcasts of carrying capacity (see ‘Carrying capacity’ below) based on a digital elevation model and estimated sea-level change over the period of interest (from 85 to 40 ka; see Scenarios). We used the ETOPO1 global relief model of Earth’s surface⁶⁶ to estimate the exposed landmass of Sahul through time. To reconstruct the landmass changes of Sahul every 1000 years, we applied sea-level variability outputs⁶⁷ to the ETOPO1 model. We also included fluctuations in Lake Carpentaria that could potentially act as a natural barrier for human movement over time. We modified the contour of the lake based on modelled sea-level changes⁶⁸ applied to the digital elevation model.

From an initially peopled cell (see Scenarios), the new population can grow following a Ricker population-dynamics model, and emigrate to adjacent cells following stochastically resampled rules of dispersal; likewise, each cell can receive immigrants from adjacent cells following similar dispersal rules (see ‘Emigration and immigration’, and ‘Long-distance dispersal’).

Population-dynamics model

Each cell within the grid acts as a particular sub-population unit within the overall dynamics of Sahul, and the summary information provided at the end of a simulation is an overall expression of all cells. The change in human abundance (N) within each cell is governed by the following phenomenological (Ricker) equation of population dynamics:

$$N_{i,j,t + 1} = N_{i,j,t}e^{r_mleft( {1 – frac{{N_{i,j,t}}}{{K_{i,j,t}}}} right)} – left( {E_{i,j,t} – I_{i,j,t}} right)$$

(1)

where i is the cell row number in the 0.5° × 0.5° latitude lattice, j is the cell column number, t is the time interval in units of human generations (1 g = 27.9 years)², N_{i,j,t + 1} is the number of individuals in cell i, j at the next time interval (t + 1), N_{i, j, t} is the number of individuals in cell i, j at time interval t, r_m is the maximum rate of population increase when resources are not limiting, K_{i, j, t} is the cell-specific carrying capacity (see ‘Carrying capacity’ below), and the E_{i, j, t} and I_{i, j, t} parameters represent the number of individuals emigrating from and immigrating into the focal cell i, j per time interval t, respectively (see ‘Emigration and immigration’). As an estimate of r_m, we set the age-structured Leslie matrix for Aboriginal hunter–gatherers² to have a survival probability (subdiagonal matrix entries) all equal to 1 (complete survival in every age class), and then took the log_e of that matrix’s dominant eigenvalue to the power of g multiplied by 2 as the generationally scaled r_m estimate required for Eq. (1). Finally, we imposed a beta-resampled additional mortality parameter M_MVP = 0.2 for cells with a population size <100 individuals. This threshold of N_MVP = 100 is derived from the ecological concept of minimum viable population size⁶⁹ where ~100 effective individuals are required to avoid inbreeding depression⁷⁰. This assumes that individual fitness declines when populations decline below the minimum size due to inbreeding depression from non-random mating and other Allee effects related to social disruption.

Carrying capacity

We constructed a theoretical carrying capacity (K) for each cell from a hindcasted estimate of net primary production based on the LOVECLIM climate reconstruction^2,3,35. In this case, we define carrying capacity as the upper limit to total abundance⁷¹, which is expressed mathematically as the point at which net population growth becomes stable (r = 0)^72,73. LOVECLIM is a three-dimensional Earth-system model of intermediate complexity³⁵ that includes representations of the atmosphere, ocean and sea ice, land surface (including vegetation), ice sheets and the carbon cycle. LOVECLIM produces climates over the past 800 ka in 1000-year time-averaged increments (layers) that we downscaled (using a bilinear interpolation)^74,75 from a spatial resolution of 5.625° × 5.625° to 0.5° × 0.5°. For each grid cell and each 1000-year layer, we extracted net primary production⁷⁶ (kg C m⁻² year^-1) as the comprehensive indicator of relative carrying capacity through time². To translate net primary production into a carrying capacity expressed in units of humans the landscape was capable of supporting, we used the predicted relationship between net primary production and human density for hunter–gatherer societies³⁷. Here, we rescaled the net primary production values over all grid cells to concur with the minimum (0.018 km⁻²) and maximum (1.152 km⁻²) human densities, multiplied by cell area (3080.25 km²) to give per-cell K. We then linearly interpolated these values between the 1000-year layers for each cell.

This approach assumes a linear relationship between K and net primary production (Supplementary Fig. 2). However, others have suggested a non-linear relationship where maximum K occurs at mid-range net primary production, because past humans possibly struck a compromise between high productivity and ease of passage and/or visibility to hunt prey by tending towards ecotones of mid-range productivity^56,57; this type of non-linear relationship has also been identified between herbivore biomass and mean annual rainfall across sub-Saharan Africa⁷⁷. We therefore also applied a 180°-rotated parabola model between K and net primary production (P_p) of the form:

$$K = a(P_p – h)^2 + K_{{mathrm{max}}}$$

(2)

where we set K_max for the rotated parabolic relationship at 0.5K_max for the linear relationship, a = −3, h = the median of P_p, and where we scaled the rotated parabolic ΣK_max across all cells and temporal layers to equal ΣK_max of the linear relationship (because of a higher frequency of cells with mid-range compared to high net primary production) (Supplementary Fig. 2). This approach assumes a logarithmic growth to and decline from a peak, so we also considered a third non-linear relationship—a reciprocal quadratic yield density—between K and P_p of the form:

$$K = frac{{P_p}}{{a + bP + cP_p^2}}$$

(3)

where a = 200, b = 0.6 and c = 0.2, which instead assumes a logistic growth to and decline from a more pronounced peak (Supplementary Fig. 2). Whether we invoked the linear, rotated parabolic or reciprocal quadratic yield relationship (see Scenarios), we Poisson-resampled the resultant K for each cell per generational time step to simulate spatio-temporal uncertainty in K.

Emigration and immigration

We reasoned that emigration out of a focal cell i, j to its eight immediately neighbouring cells would be a function of the gradient in K_t between two cells⁴⁴, as well as how close the abundance N_{i, j, t} of cell i, j was to K_{i, j, t}. If the Poisson-resampled K_t ratio between two adjacent cells (K_rel = K_{i, j, t}/K_{i + y, j + x, t}, where x and y are integers ranging from −1 to 1 to define the immediate one-cell neighbourhood of the focal cell i, j) was <1, and N_{i, j, t}/K_{i, j, t} was > a random uniform proportion between 0.3 and 0.7, then emigration ensued. The values of 0.3 and 0.7 originate from Birdsell⁷⁸ who stated that group ‘budding off’ or ‘fissioning’ occurred when a population reached 30–70% of carrying capacity. We further imposed an exponential decay function to describe the declining probability of emigration (Pr(E)) as the ratio of K_{i, j, t}/K_{i + y, j + x, t} = K_rel increased towards 1 (Supplementary Fig. 6), where:

$$Pr left( E right) = e^{ – 3.2K_{{mathrm{rel}}}}$$

(4)

For the size of the emigrating group N_mig, we assumed that this represented a beta-resampled proportion P_mig = 1/3, such that N_mig is centred on N_{i, j, t} × P_mig (with SD = 0.05N_{i, j, t} × P_mig) based on the observation that when coastal populations of Aboriginal Australians fission, the population tends to separate into one larger and one smaller group, with the latter emigrating⁷⁹. We applied this calculation for each of the eight cell comparisons, removing those already emigrated from each cell’s subsequent estimate (moving in sequence through directions NW, N, NE, W, E, SW, S and SE of focal cell i, j). Likewise, when K_{i, j, t}/K_{i + y, j + x,t} > 1, immigration into cell i, j occurred following the same movement rules as for emigration.

Long-distance dispersal

We used the allometric relationship of natal dispersal for omnivorous and herbivorous mammals⁴⁰ to predict a dispersal probability for humans. Assuming a mean adult mass of M = 50 kg, maximum natal dispersal distance D_m is estimated as aM^b, where a = 3.31 ± 1.17 and b = 0.65 ± 0.05 for omnivores and herbivores combined⁴⁰. This produced an estimated maximum dispersal distance D_m ranging from 22.4 to 69.3 km. As a maximum dispersal range, this compares well to the average mobility of African hunter–gatherers of 1400–3900 km²/generation³⁹ (equivalent to a radius of 21.1–35.2 km assuming a perfect circle), and the 0.4–1.1 km yr⁻¹ (11.2–30.7 km/generation) estimates for Palaeolithic human expansions in northern Europe⁴¹. Also, Gould³⁸ reported journeys by Aboriginal Australians of 400 to 560 km as ‘not unusual’ and perhaps the greatest mobility ever recorded, moving as many as nine times in three months, and covering an area of ~2600 km² (radius = 28.8 km).

Next, we used the estimated probability of dispersal (Pr(d_max)) of d exceeding multiples (1 to 10) of one cell width (0.5 × 111.12 = 55.6 km) as (Pr left( {d_{{mathrm{max}}}} right) = e^{ – d/aM^b}) (Supplementary Fig. 7). However, there is evidence globally that the territory size of hunter–gatherer groups is strongly related to local productivity, with a greater need to expand foraging areas as productivity declines^19,80. Using territory size and rainfall data from Hiscock⁸⁰, we assumed the same relative change in rainfall applied to net primary productivity, but shifted the power–law relationship upwards to match the slope of the upper limit of maximum dispersal distance (Supplementary Fig. 7b). Thus, for every tenfold decrease in relative net primary production, maximum dispersal distance increases by 12.7 times (Supplementary Fig. 7b). Once a long-distance dispersal event occurred, we Poisson-resampled the maximum dispersal distance to provide a δx and a δy to move from the focal cell in cell units (including a random direction: east–west for δx, and north–south for δy). The size of the long-distance-dispersing population followed the same rules as for neighbouring-cell emigration.

Distance-to-water limitation

While territory size, and hence, maximum dispersal distances increase with increasing aridity according to the relationships described above, there is evidence that human dispersal is ultimately limited by water availability¹⁵. This is likely to be even more relevant in Australia, the driest inhabited continent on Earth—indeed, estimated routes of gene flow among Aboriginal Australians suggest that the arid interior acted as a barrier to migration¹¹. We therefore invoked an additional limitation on dispersal by calculating a probability of realizing a long-distance dispersal event (P_l) according to the following equation previously designed to limit modelled species migrations⁸¹:

$$P_l = 1 – left( {frac{{D_l}}{{D_{H_2O}}}} right)^{Omega}$$

(5)

where D_l = the realized maximum dispersal distance generated from the algorithm described above, (D_{H_2O}) = the distance to water in units of map cells derived from the Australian Water Observations from Space dataset¹⁵, and Ω = the hydrological resistance parameter set arbitrarily to a value of 3 to invoke landscape-scale resistance to movement only in the driest areas of Sahul per generational time step.

Ruggedness

We hypothesized that high landscape ruggedness (elevational gradient) might at least partially impede the progress of human expansion across the landscape⁴², so we tested this using data available in an ethnographic and environmental dataset compiled by Binford⁴². Available in the binford library⁸² in R, the dataset includes >200 variables measuring aspects of hunter–gatherer subsistence, mobility and social organization for 339 ethnographically documented groups. Given the evidence that mobility is a function of productivity^36,80, we constructed a simple linear model of annual movement varying with annual rainfall and the difference between maximum and minimum elevation within a 25- (40.2 km) mile radius of the group’s centroid (equivalent to an elevational gradient; i.e., ruggedness). Taking the cube root of annual movement and the difference in maximum and minimum elevation to comply with the assumption of Gaussian error distributions, the expected relationship between movement and rainfall prevailed, and there was a weak effect of elevational difference—a maximum of 1% reduction in annual movement (Supplementary Fig. 8). Expressed on the linear scale and standardizing annual movement and elevational difference to the range of 0–1 (assuming a constant median annual rainfall value), an exponential decay function of the form:

$$M_{{mathrm{red}}} = a + broot {3} of {{G_{{mathrm{rel}}}}}$$

(6)

where M_red = the proportion of expected total annual movement, a = 1.001116, b = −0.0104453 and G_rel = the standardized ruggedness from 0 to 1, described the reduction in annual movement rates up to a maximum of 1% (Supplementary Fig. 8). For all instances of emigration, immigration and long-distance dispersal, we assigned this function to the total number of people migrating for each cell based on its standardized ruggedness. We computed the topographic ruggedness index⁸³ as the difference in elevation between a given cell and its eight neighbouring central cells, based on our digital elevation model. For a given cell, we then squared each of the eight elevation difference values (to render them positive), and calculated the square root of the averages of the squares. We updated the spatial resolution of our results to 0.5° × 0.5° to match the other environmental layers.

Catastrophic mortality events

Palaeo-demographic investigations of past human populations suggest that long-term population growth rates were just slightly higher than zero as a result of episodes of catastrophic mortality arising from pandemics, natural disasters and violent conflicts occurring every few generations⁸⁴. This also agrees well with estimates of the probability of mass mortality events scaling to generation time for vertebrates (Pr(catastrophe) = 0.14 per generation)⁴³. We thus sampled binomially at Pr = 0.14 for whether a catastrophe occurred in each focal cell, and then beta-sampled the severity of the event centred on M_cat = 0.5 (SD = 0.5/10) to emulate a stochastic catastrophe event of 50% mortality, on average, for that cell⁴³.

However, we reasoned that a random allocation of catastrophes among cells across the entirety of Sahul was not realistic, for the reason that mortality events arising from natural disasters, warfare or disease outbreaks would likely be spatially aggregated. We therefore imposed a Thomas cluster process using the rThomas function from the spatstat R library⁸⁵, setting the intensity of the Poisson process of cluster centres κ to a linear relationship between the number of cells occupied per iteration and a vector ranging from 0.3 to 1.2, the standard deviation of random displacement along each coordinate axis of the grid of a given cell away from the cluster centre σ_scale = 0.015, and the mean number of cells per cluster μ = 0.6 × the mean dimension of the occupied grid per iteration. This combination of parameters led to a reasonable degree of spatial clustering while maintaining a random spread of cells around a catastrophe focal point, as well as maintaining the overall proportion of cells across the landscape experiencing a catastrophic mortality event ~0.14 per generational iteration.

Scenarios

We ran 120 scenarios (8 entry times, ×5 entry sequences, ×3 relationships between carrying capacity and net primary production) where we modified three main components of the stochastic simulations: (i) the timing of first entry to Australia (from 85 to 50 ka, in 5000-year increments), (ii) the place of entry (northern, southern, simultaneous northern and southern, northern followed by southern 2000 years later, or southern followed by northern 2000 years later) and (iii) the form of the relationship between hindcasted net primary productivity and human carrying capacity (linear, rotated parabolic or reciprocal quadratic yield density). We repeated each scenario 100 times to generate a per-cell confidence interval of time of first arrival. Here, we deemed a cell to have been populated for the first time once it received ≥100 individuals (N_first), which is considered the minimum viable effective population size to avoid inbreeding depression⁷⁰.

Comparison layers

To test the resultant outputs against real archaeological data, we compiled a conservative list of ages older than 30 ka obtained from across Sahul (see ‘Compiling reference archaeological dates’ in the Supplementary Information and Supplementary Data 1). However, the spatial coverage of these ages is highly uneven (Fig. 1a), so we applied a maximum-likelihood method to correct for the Signor–Lipps effect first developed by Solow⁸⁶ and adapted for spatial inference of both first-arrival and extinction patterns⁸⁷. While described in more detailed elsewhere⁸⁷, we briefly summarize the approach here.

To correct for the inherent spatial bias of dates in a landscape, let x₁,…x_n be the spatial locations of n dated specimens in an area W and a₁,…a_n their respective ages. The estimated average age M(x) of a putative date at a given location x is based on a standard kriging procedure⁸⁸ derived from the spatial covariance between the age of two dated specimens as a function of their respective pairwise distance, so that:

$$hat Mleft( x right) = mathop {sum}limits_{i le n} {w_ileft( x right)a_i}$$

(7)

where (w_1left( x right), ldots w_nleft( x right)) follows (mathop {sum}nolimits_{i le n} {w_ileft( x right) = 1}) and minimizes

$$mathop {sum}limits_{i le n} {w_ileft( x right)gamma left( {x_i – x_j} right) + mu = gamma (x – x_j)}$$

(8)

for j ≤ n, with μ being a Lagrange multiplier so that (mu = mathop {sum}nolimits_{i le n} {gamma (x_i – x)}) and γ is the variogram:

$$gamma left( u right) = frac{1}{2}Eleft( {aleft( z right) – aleft( {z + u} right)} right)^2 = sigma ^2 – cleft( u right)$$

(9)

where a(z) is the age a of a specimen found at a given location z (with z ∈ W), σ² is the variance of a(z) and c(u) is the covariance between a(z) and a(z + u), with any two locations in W separated by distance u.

We then modified Solow’s method⁸⁹ to correct for taphonomic bias, which assumes initially that the distribution of ages through time is uniform between a given age A₀ when individuals are assumed to be present, and the date of arrival A. For n ages of a given time series at a given location, the estimated terminal age (hat A) is therefore:

$$hat A = A_0 + frac{{n + 1}}{n}{max} _{i}left( {a_i – A_o} right)$$

(10)

To integrate this method into a spatial context, we estimated a preliminary age A_p across space assuming (hat Mleft( x right)) follows a stationary random field:

$$hat A_pleft( x right) = 2hat Mleft( x right) – A_0$$

(11)

But this generates a spatial bias (hat A_pleft( x right) – A(x)), in every (hat A_pleft( x right)), so we applied a simulation-based, spatial-bias-correction procedure⁹⁰ to estimate the bias generated by Eq. (11) at each x across W. The first step assumes that (hat A_pleft( x right)) is the ‘true’ date of the terminal event in x. Based on these (hat A_pleft( x right)), we generated k age samples (a^{(k)} = (a_1^{left( k right)}, ldots ,a_n^{(k)})) at the same locations x₁, … x_n following the same spatial pattern and characteristics as the dated record and sampled independently from a uniform distribution on ([A_{0,}hat A_pleft( {x_i} right)]). We then inferred (hat A^{(k)}(x)), the timing of the terminal event for the k new simulated time series and calculated an estimated total bias (hat Bleft( x right)) across all k ages:

$$hat Bleft( x right) = frac{1}{k}mathop {sum}limits_k {hat A^{left( k right)}left( x right) – hat A_pleft( x right)}$$

(12)

The final estimate of the timing of the terminal event of interest (hat Aleft( x right)) is the distribution of the preliminary dates (hat A_pleft( x right)) for every location x corrected by (hat Bleft( x right)), such that:

$$hat Aleft( x right) = hat A_pleft( x right) – hat Bleft( x right)$$

(13)

Because archaeological age estimates a_i are always associated with an inherent dating uncertainty σ_i, we assumed that age uncertainties are Gaussian and independent⁹¹ so that the probability density of the estimated age of the terminal event A(x) follows:

$$int_{{it{epsilon }}_1,…,{it{epsilon }}_n} {A_{left{ {a_1 + {it{epsilon }}_1, ldots ;a_n + {it{epsilon }}_n} right}}(x)prod _ig_ileft( {{it{epsilon }}_i} right)d{it{epsilon }}_1 ldots d{it{epsilon }}_n}$$

(14)

where g_i = the density of the Gaussian random variable with mean 0 and variance (sigma _i^2), and (A_{left{ {a_1 + {it{epsilon }}_1,, .., a_n + {it{epsilon }}_n} right}}(x)) = the final estimate at a given location x for a time series of age (a_1 + {it{epsilon }}_1,, .., a_n + {it{epsilon }}_n) located at x₁,..x_n, respectively. We applied the same Cook and Stefanski bias-correction procedure so that the k ages are independently sampled from a uniform distribution on ([A_{0,}hat A_pleft( {x_i} right)]) at the same locations x₁, … x_n following the same spatial pattern and characteristics as the dated record. This gives (a^{(k)} = (a_1 + {it{epsilon }}_1^{(k)},, .., a_n + {it{epsilon }}_n^{(k)})) with ({it{epsilon }}_i^{(k)}) independently sampled as a function of the probability density described in Eq. (14) to account for the dating uncertainty associated with each age. We then use the terminal ages (hat A^{(k)}(x) = A_{left{ {a_1^{left( k right)},,..,, a_n^{left( k right)}} right}}left( x right)) to estimate the bias in Eq. (12) and apply this to provide a corrected timing of (hat Aleft( x right)) for every x following Eq. (13).

Global sensitivity analysis

We designed a global sensitivity analysis to provide robust sensitivity measures of the probability of the time to saturation of the entire Sahul continent to variation in the underlying parameters of our stochastic model^54,92; this analysis does not repeat the scenario-testing parameters (i.e., time of entry, point(s) of entry, K–P_p relationship). For this global sensitivity analysis, we used the initial scenario parameters of a 50-ka entry at the southern route, the rotated parabolic relationship between K and P_p, and assuming a founding population size stochastically sampled between 1300 and 1500 people for the entry point².

Here, we ran the cellular-automaton spatial model 1000 times, randomly sampling 12 of its parameters uniformly for each iteration based on a Latin hypercube-sampling protocol⁵⁴. We set the 12 parameters to be sampled with ±50% variation on the median value used in the model (except for N_cat and max N/K with a maximum upper bound of 0.99, and for maximum D_cell—see below); these 12 parameters were: (i) the maximum generational rate of population increase r_m used to parameterize the phenomenological population-dynamics model per cell (range: 0.10–0.31), (ii) the minimum maximal dispersal distances D_m estimated from the allometric prediction (11–34 km), (iii) the cell-based maximum dispersal distance modifier (max D_cell), ranging from 1× to 5× the value set in the original model (10 cells), (iv) the cell-based minimum viable population size N_MVP (50–150 individuals) below which we set (v) an additional mortality parameter M_MVP (0.1–0.3), (vi) the hydrological resistance parameter Ω invoking landscape-scale resistance to movement only in the driest areas of Sahul per generational time step (1.5–4.5), (vii) the beta-resampled mean mortality of a cell during a catastrophe event M_cat (0.38–0.99), (viii) the beta-resampled proportion of people moving between cells when a migration event occurs P_mig (0.17–0.50), the beta-resampled (ix) minimum and (x) maximum ratios of N/K per cell invoking an emigration event (0.15–0.45 and 0.35–0.99, respectively), (xi) a resistance modifier R that modified the relationship between landscape ruggedness and maximum dispersal probability (0.5–1.5) and (xii) the population threshold N_first above which we determined a cell to be occupied for the calculation of the date of first arrival in a cell (50–150 individuals).

We chose to summarise the output of each of these 1000 parameter-sampled runs of the spatial model as the time taken to achieve continental saturation (i.e., the number of years taken from initial entry to occupy every cell in Sahul). In a separate analysis, we then tested the influence of the per-model run parameter values (predictors) on the time to continental saturation (response) using a boosted-regression tree⁹³ emulator with the function gbm.step in the dismo R library⁹⁴. Here, we set the error distribution family as Gaussian, the bag fraction to 0.75, the learning rate to 0.008, the tolerance to 0.0001, the maximum number of trees to 10,000 and the tree complexity to 2 (first-order interactions only). To assess the relative contribution of each of the 12 randomly sampled parameters to the time to spatial saturation, we calculated the boosted-regression tree metrics of relative influence⁵⁴.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: Ecology - nature.com