Global diversity dynamics in the fossil record are regionally heterogeneous

Spatial standardisation workflow

To produce spatially-standardised fossil occurrence datasets which remain geographically consistent through time, we designed a subsampling algorithm which enforces consistent spatial distribution of occurrence data between time bins, while maximising data retention and permitting highly flexible regionalisation (Fig. 8). Our method was developed in light of, and takes some inspiration from, the spatial standardisation procedure of Close et al.². This method provides, within a given time bin, subsamples of occurrence data with threshold MST lengths. An average diversity estimate can be taken from this ‘forest’ of MSTs, selecting only those of a target tree length to ensure spatially-standardised measurements. It does not produce a single dataset across time bins, however; rather a series of discontinuous, bin-specific datasets which cannot then simply be concatenated as the spatial extents of each bin-specific forest are not standardised (despite each individual MST being so), even when MSTs are assigned to a specific geographic region, e.g. a continent or to a particular latitudinal band. This prevents estimation of rates, because such analyses require datasets that span multiple time bins and remain geographically consistent and spatially standardised through the time span of interest. This is the shortcoming that our method overcomes. The workflow consists of three main steps.

Fig. 8: Component steps of our spatial standardisation workflow.

A A spatial window (dotted lines) is used to demarcate the spatial region of interest, which may shift in a regular fashion through time to track that region. Data captured in each window is clipped to a target longitude–latitude range (orange lines). B The data forming the longitude–latitude extent is marked, then masked from further subsampling. C Data are binned using a hexagonal grid, the tally of occurrences in each grid cell taken, and a minimum spanning tree constructed from the grid cell centres. D The cells with the smallest amount of data are iteratively removed from the minimum spanning tree until a target tree length is reached.

Full size image

1. First, the user demarcates a spatially discrete geographic area (herein the spatial window) and a series of time bins into which fossil occurrence data is subdivided. Occurrence data falling outside the window in each time bin are dropped from the dataset, leaving a spatially restricted subsample (Fig. 8A). Spatial polygon demarcation is a compromise between the spatial availability of data to subsample and the region of interest to the user but allows creation of a dataset where regional nuances of biodiversity may be targeted. Careful choice of window extent can even aid subsequent steps by targeting regions that have a consistently sampled fossil record through time, even if the extent of that record fluctuates. To account for spatially non-random changes in the spatial distribution of occurrence data arising from the interlinked effects of continental drift, preservation potential and habitat distribution¹⁷, the spatial polygon may slide to track the location of the available sampling data through time. This drift is performed with two conditions. First, the drift is unidirectional so that the sampling of data remains consistent relative to global geography, rather than allowing the window to hop across the globe solely according to data availability and without biogeographic context. Second, spatial window translation is performed in projected coordinates so that its sampling area remains near constant between time bins, avoiding changes in spatial window area that could induce sampling bias from the species-area effect.

2. Next, subsampling routines are applied to the data to standardise its spatial extent to a common threshold across all time bins using two metrics: the length of the MST required to connect the locations of the occurrences; and the longitude–latitude extent of the occurrences. MST length has been shown to measure spatial sampling robustly as it captures not just the absolute extent of the data but also the intervening density of points, and so is highly correlated with multiple other geographic metrics¹⁶. MSTs with different aspect ratios may show similar total lengths but could sample over very different spatial extents, inducing a bias by uneven sampling across spatially organised diversity gradients¹⁶; standardising longitude–latitude extent accounts for this possibility. The standardisation methods can be applied individually or serially if both MST length and longitude–latitude range show substantial fluctuations through time. Data loss is inevitable during subsampling and may risk degrading the signals of origination, extinction and preservation. To address this issue, subsampling is performed to retain the greatest amount of data possible. During longitude–latitude standardisation, the range containing the greatest amount of data is preserved. During MST standardisation, occurrences are spatially binned using a hexagonal grid to reduce computational burden and to permit assessment of spatial density (Fig. 8B). The grid cells containing the occurrences that define the longitude–latitude extent of the data are first masked from the subsampling procedure so that this property of the dataset is unaffected, and then the occurrences within the grid cells at the tips of the MST are tabulated. Tip cells with the least data are iteratively removed (removal of non-tip cells may have little to no effect on the tree topology) until the target MST length is achieved (Fig. 8D), with tree length iteratively re-calculated to include the branch lengths added by the masked grid cells.

For both methods, the solution with the smallest difference to the target is selected and so both metrics may fluctuate around this target from bin to bin, with the degree of fluctuation depending upon the availability of data to exclude—larger regions that capture more data are more amenable to the procedure than smaller regions. Similarly, the serial application of both metrics reduces the pool of data available to the second method, although longitude–latitude standardisation is always applied first in the serial case so that the resultant extent will be retained during MST standardisation. Consequently, the choice of standardisation procedure and thresholds must be tailored to the availability and extent of data within the sampling region through time, along with the resulting degree of data loss. This places further emphasis on the careful construction of the spatial window in the first step. Threshold choice is also a compromise between data loss and consistency of standardisation across the dataset and so it may be necessary to choose targets that standardise spatial extent well for the majority of the temporal range of a dataset, rather than imposing a threshold that spans the entire data range but causes unacceptable data loss in some bins.

3. Once the time-binned, geographically restricted data have been spatially standardised, the relationship between diversity and spatial extent is scrutinised. After standardisation, it is expected that residual fluctuations in spatial extent should induce little or no change in apparent diversity. Bias arising from temporal variation in sampling intensity may still be present, so diversity is calculated using coverage-based rarefaction (also referred to as shareholder quorum subsampling^13,62,63), with a consistent coverage quorum from bin to bin. While coverage-based rarefaction has known biases, it remains the most accurate non-probabilistic means of estimating fossil diversity¹⁴. As such, we consider it the most appropriate method to assess the diversity of a region-level fossil dataset. The residual fluctuations in spatial extent may then be tested for correlation with spatially standardised, temporally corrected diversity. If a significant relationship is found, then the user must go back and alter the standardisation parameters, including the spatial window geometry and drift, the longitude–latitude threshold, and the MST threshold. Otherwise, the dataset is considered suitable for further analysis.

We implement our subsample standardisation workflow in R with a custom algorithm, spacetimestand, along with a helper function spacetimewind to aid the initial construction of spatial window. spacetimestand can then accept any fossil occurrence data with temporal constraints in millions of years before present and longitude–latitude coordinates in decimal degrees. Spatial polygon construction and binning is handled using the sp library⁶⁴, MST manipulation using the igraph and ape libraries^65,66, spatial metric calculation using the sp, geosphere and GeoRange libraries^67,68, hexagonal gridding using the icosa library⁶⁹, and diversity calculation by coverage-based rarefaction using the estimateD function from the iNEXT library⁷⁰. Next, we apply our algorithm to marine fossil occurrence data from the Late Permian to Early Triassic.

Data acquisition and cleaning

Fossil occurrence data for the Late Permian (260 Ma) to Early Jurassic (190 Ma) were downloaded from the PBDB on 28/04/21 with the default major overlap setting applied (an occurrence is treated as within the requested time span if 50% or more of its stratigraphic duration intersects with that time span), in order to minimise edge effects resulting from incomplete sampling of taxon ranges within our study interval of interest (the Permo-Triassic to Triassic-Jurassic boundaries). Other filters in the PBDB API were not applied during data download to minimise the risk of data exclusion. Occurrences from terrestrial facies were excluded, along with plant, terrestrial-freshwater invertebrate and terrestrial tetrapod occurrences (as these may still occur in marine deposits from transport) and occurrences from several minor and poorly represented phyla. Finally, non-genus level occurrences were removed, leaving 104,741 occurrences out of the original 168,124. Based on previous findings², siliceous occurrences were not removed from the dataset, despite their variable preservation potential compared to calcareous fossils. To increase the temporal precision of the dataset, occurrences with stratigraphic information present were revised to substage- or stage-level precision using a stratigraphic database compiled from the primary literature. To increase the spatial and taxonomic coverage of the dataset, the PBDB data were supplemented by an independently compiled genus-level database of Late Permian to Late Triassic marine fossil occurrences³⁶. Prior to merging, occurrences from the same minor phyla were excluded, along with a small number lacking modern coordinate data, leaving 47,661 occurrences out of an original 51,054. Absolute numerical first appearance and last appearance data (FADs and LADs) were then assigned to the occurrences from their first and last stratigraphic intervals, based on the ages given in A Geologic Timescale 2020⁷¹. Palaeocoordinates were calculated from the occurrence modern-day coordinates and midpoint ages using the Getech plate rotation model. Finally, occurrences with a temporal uncertainty greater than 10 million years and occurrences for which palaeocoordinate reconstruction was not possible were removed from the composite dataset, leaving 145,701 occurrences out of the original 152,402.

In the total dataset, we note that the age uncertainty for occurrences is typically well below their parent stage duration, aside for the Wuchiapingian and Rhaetian where the mean and quartile ages are effectively the same as the stage length (Fig. S44). This highlights the chronostratigraphic quality of our composite dataset, particularly for the Norian stage (~18-million-year duration) which has traditionally been an extremely coarse and poorly resolved interval in Triassic-aged macroevolutionary analyses. Taxonomically, most occurrences are molluscs (Fig. 8), which is unsurprising given the abundance of ammonites, gastropods and bivalves in the PBDB, but introduces the caveat that downstream results will be driven primarily by these clades. Foraminiferal and radiolarian occurrences together comprise the next most abundant element of the composite dataset, demonstrating that we nonetheless achieve good coverage of both the macrofossil and microfossil records, along with broad taxonomic coverage in the former despite the preponderance of molluscs.

Spatiotemporal standardisation

We chose a largely stage-level binning scheme when applying our spatial standardisation procedure for several reasons. First, the volume of data in each bin is greater than in a substage bin, providing a more stable view of occurrence distributions through time and increasing the availability of data for subsampling. Spatial variation at substage level might still affect the sampling of diversity, but the main goal of this study is to analyse origination and extinction rates where taxonomic ranges are key rather than pointwise taxonomic observations. Consequently, substage level variation in taxon presences likely amounts to noise when examining taxonomic ranges, making stage-level bins preferable in order maximise signal.

During exploratory standardisation trials, we found a large crash in diversity and spatial sampling extent during the Hettangian (201.3–199.3 Mya). No significant relationships with spatially-standardised diversity were found when the Hettangian bin was excluded from correlation tests, indicating its disproportionate effect in otherwise well-standardised time series. Standardising the data to the level present in the Hettangian would have resulted in unacceptable data loss so we instead accounted for this issue by merging the Hettangian bin with the succeeding Sinemurian bin, where sampling returns to spatial extents consistent with older intervals. While this highlights a limitation of our method, as the Hettangian is <2 Ma in length, it is reasonable to expect it would have a minor effect on taxonomic ranges in the long term, despite the magnitude of the sampling crash, and that any taxa surviving through the interval will be recorded in the much better sampled Sinemurian.

The occurrence data were plotted onto palaeogeographic maps to identify biogeographic regions that could feasibly be subsampled consistently through time. We identified five such regions which broadly correspond to major Permo-Jurassic seaboards and ocean basins: Circumtethys, Boreal, North Panthalassic and South Panthalassic, along with an unexpected set of marine occurrences from the Australian and New Zealand fossil record, which we term the Tangaroan (so named for the Maori god of the oceans). As Circumtethys is an extremely large region compared to the others, we subdivide it into eastern and western subdomains. While the extent of spatial regions reflects a compromise between biogeographic discretion and data availability and can theoretically be arbitrary, we note that most of our regions share a degree of correspondence with bioregions for the Permo-Triassic predicted from abiotic drivers of marine provinciality⁷², suggesting that they are biologically realistic to a certain extent. The major exception to this is our east-west division of Tethys compared to the north-south divide recovered by Kocsis et al.^32,33,72 as this was a compromise between biogeographic realism and data availability through time.

All regions extend for the full temporal range of the composite dataset, aside from the South Panthalassic, which covers the Late Triassic to Early Jurassic. Spatial windows were constructed for each region using the spacetimewind R function, then data were subsampled into each region under the described binning strategy using the spacetimestand R function. Four treatments were conducted for each polygon-binned dataset: no standardisation, standardisation by MST length, standardisation of the longitude–latitude extent and standardisation with both methods. For each treatment in each region, bin-wise diversity was calculated using coverage-based rarefaction at coverage levels of 40, 50, 60 and 70% (Figs. S8–S14). The relationships between diversity at each level of coverage with MST length, longitude range and latitude range were interrogated using one-tailed Pearson’s product moment and Spearman’s rho tests of correlation, with Benjamini–Hochberg correction for multiple comparisons⁷³. Spatial standardisation protocols for each region were then adjusted to eliminate significant correlations as needed.

Rate data and preservation model

Origination, extinction, and preservation rates were jointly estimated in a Bayesian framework using PyRate (v3.0). PyRate implements realistic preservation models that can vary through time and among taxa, yielding substantial increases in rate estimation accuracy over traditional methods. The method can also model occurrence-age uncertainty and provides an explicit model-based means of testing whether proposed rate shifts are significant²³. A comparable approach is the FBD-range model which accounts for unsampled diversity, something PyRate cannot do by default, but assumes an unrealistic constant preservation rate⁷⁴. An implementation of FBD-range is present experimentally within PyRate, but the complexity of the analysis currently renders this method computationally intractable for large datasets. Regardless, the FBD-range model and PyRate have been compared against one another, as well as against results from traditional methods, with FBD and PyRate showing largely comparable performance (although FBD remains more accurate under some scenarios of lower preservation rates and high turnover) and both FBD and PyRate outstrip traditional methods significantly⁷⁴.

PyRate has been criticised recently for only performing well when data availability is high and consistently sampled⁷⁵. This criticism, however, was based on simulated data with an underlying phylogenetic structure parameterised from a tree of ornithischian dinosaurs, whose fossil record is known to be inconsistent⁷⁶ and is at odds with the findings of simulations covering a broader range of turnover and preservation rates⁷⁴. PyRate is demonstrably subject to the pitfall of spatial variability in the fossil record, with regional analyses of the crocodylomorph fossil record indicating declining diversity⁷⁷, while global analysis with PyRate spuriously recovers increasing diversity driven by expansion of the geographic range of their fossil record^17,77,78. We avoid the issue of spatial variability with our standardisation procedure and the marine fossil record is well-sampled compared to the scenarios where PyRate otherwise begins to perform poorly. Therefore, we assert that PyRate is a suitable method for inferring diversity dynamics from our dataset and we elect not to use traditional methods (e.g. boundary crossers or three-timer rates⁷⁹).

We analysed datasets from the unstandardised, MST-standardised and MST + longitude–latitude-standardised treatments; as MST length is the most important control on spatial extent, the dataset with longitude–latitude standardisation only is expected to retain significant spatial bias. Ten age-randomised input datasets for each region and data treatment were generated in R with locality-age dependence (all occurrences from a locality are given the same randomised age), using collection number as a proxy for locality for PBDB-derived occurrences and geological section names for occurrences from the independent dataset. Locality-age dependence is both logically desirable as locality occurrences strictly represent a geographically localised and temporally discrete fauna (in idealised terms an assemblage from a single bedding plane) and which has been shown to improve precision in age estimates in other Bayesian dating procedures using fossil data⁸⁰.

The best fitting preservation model (homogenous, HPP; non-homogenous, NHPP; or time-variable homogenous Poisson process TPP) for each dataset was identified by maximum likelihood using the -PPmodeltest function of PyRate, with the best fitting model identified using the Akaike Information Criterion⁸¹. In addition to testing between the HPP, NHPP and TPP preservation models, we also tested between three TPP models of differing complexity: one with stage-level bins, one with stage-level bins and subdivision of the Norian stage into three sub-bins (the informal divisions Lacian, Alaunian and Sevatian), and one with substage-level bins and subdivision of the Norian stage into three sub-bins. For all datasets, the last binning scheme was found to be the best fitting, despite the greater number of model parameters (individual time-bin preservation rates) that it introduces. As well as using the TPP model of preservation through time with substage-level bins and threefold subdivision of the Norian, the preservation rate was also allowed to vary according to a gamma distribution (here discretized into eight rate multipliers^22,82) on taxon-wise preservation rates. While there is currently no way to test between preservation models with and without the gamma parameter in PyRate, it is a recommended addition due to the known empirical variability of preservation rates among taxa, especially for taxonomically diverse datasets and because it includes a single additional parameter in the model. In each analysis, the bin-wise preservation rates were assigned a gamma prior with fixed shape parameter set to 2, while the scale parameter was itself assigned a vague exponential hyperprior and estimated through MCMC (PyRate option -pP 2 0). This hierarchical approach provides a means of regularisation while allowing the prior on the preservation to adapt to the dataset²³. Finally, rate shifts outside the covered range of the data were excluded in each analysis to avoid edge effects during parameter estimates (PyRate option -edgeShift).

Rate estimation

Regardless of the chosen preservation model, a PyRate analysis is parameter-rich as the individual origination and extinction times for each taxon are jointly estimated along with the overall rates. PyRate additionally uses a reversible-jump Markov Chain Monte Carlo (rjMCMC) with a standard Metropolis Hastings algorithm to sample parameters across models with different numbers of rate shifts. This produces high computational burden, and models for larger sampling regions could not be estimated efficiently. PyRate can alternatively use an efficient Gibbs algorithm to sample from the posterior distribution of the parameters, producing preservation-corrected estimates of origination and extinction times that are virtually identical to those from the Metropolis Hastings algorithm, but with a coarse birth-death model that involves a dramatic loss of resolution in the resulting rate curves⁸³. A second programme, LiteRate, has been developed to permit origination and extinction rate estimation for taxonomically large datasets^24,25, gaining computational efficiency by implementing the same birth-death model used by PyRate with the rjMCMC and Metropolis Hastings algorithm, but without estimation of the complex preservation model. As we expect ranges in a fossil dataset to be truncated by variation in preservation rate through time, times of origination and extinction would be inaccurately estimated if LiteRate were run directly with a fossil dataset.

To overcome these methodological issues, we use a two-step procedure to permit efficient model estimation for taxonomically large fossil datasets. First, we use PyRate with the Gibbs algorithm to jointly estimate the parameters of the preservation model and the preservation rate-corrected estimates of origination and extinction times for each taxon. The origination and extinction time estimates are then supplied as input in LiteRate, leaving only the estimation of rates and rate shifts from the computationally efficient birth-death model. In summary, PyRate is used to perform the computationally expensive task of estimating the complex preservation model parameters and taxon-specific origination and extinction times using the computationally efficient Gibbs algorithm, while LiteRate is used to estimate the high-resolution birth-death model, rates and rate shifts for the taxonomically large dataset.

PyRate analyses for each region were run across sets of ten age-randomised replicates for five million generations, aside for the Tangaroan (10 million) and South Panthalassic (20 million), with sampling rates set to produce 10,000 samples of the posterior. Output datasets were assessed using Tracer (v1.7.1)⁸⁴ to determine suitable burn-in values by visually inspecting the MCMC trace, and to check for convergence by ensuring minimum effective sample sizes on all model parameters of 200 post burn-in for each analysis. Mean origination and extinction times were derived using the -ginput function of PyRate with a 10% burn-in, before being supplied to LiteRate. LiteRate analyses for each region were run across the 10 sets of mean origination and extinction times for 200 million generations, aside for the South Panthalassic (250 million). To incorporate age uncertainty into each analysis, logs from each age-randomised replicate were combined respectively for PyRate and LiteRate using the -combLog function of PyRate, taking 100 random samples from each log post 10% burn-in, to give 1000 samples of the posterior across all age-randomised replicated. Rates were then plotted at 0.1 million-year intervals and statistical significance of rate shifts recovered by the rjMCMC assessed using Bayes factors (log BF > 2: positive support, log BF > 6: strong support)⁸⁵ using the -plotRJ function of PyRate.

Probabilistic diversity estimation

Traditional methods of estimating diversity do not directly address uneven sampling arising from variation in preservation, collection and description rates, and their effectiveness is highly dependent on the structure of the dataset. We present an alternative method to infer corrected diversity trajectories based on the sampled occurrences and on the preservation rates through time and across lineages as inferred by PyRate, which we term mcmcDivE. The method implements a hierarchical Bayesian model to estimate corrected diversity across arbitrarily defined time bins. The method estimates two classes of parameters: the number of unobserved species for each time bin and a parameter quantifying the volatility of the diversity trajectory.

We assume the sampled number of taxa (i.e. the number of fossil taxa, here indicated with x_t) in a time bin to be a random subset of an unknown total taxon pool, which we indicate with D_t. The goal of mcmcDivE is to estimate the true diversity trajectory ({{{{{bf{D}}}}}},=,left{{D}_{1},{D}_{2},ldots ,{D}_{t}right}), of which the vector of sampled diversity ({{{{{bf{x}}}}}},=,{{x}_{1},{x}_{2},ldots ,{x}_{t}}) is a subset. The sampled diversity is modelled as a random sample from a binomial distribution⁸⁶ with sampling probability p_t:

$${x}_{t}, ,sim, {{{{{rm{Bin}}}}}}({D}_{t},{p}_{t})$$

(1)

We obtain the sampling probability from the preservation rate (q_t) estimated in the initial PyRate analysis. If the PyRate model assumes no variation across lineages the sampling probability based on a Poisson process is ({p}_{t},=,1,-,{{{{{rm{exp }}}}}}({-q}_{t},times, {delta }_{t})), where δ_t is the duration of the time bin. When using a Gamma model in PyRate, however, the q_t parameter represents the mean rate across lineages at time t and the rate is heterogeneous across lineages based on a gamma distribution with shape and rate parameters equal to an estimated value α.

To account for rate heterogeneity across lineages in mcmcDivE, we draw an arbitrarily large vector of gamma-distributed rate multipliers g₁, …, g_R ~ Γ(α,α) and compute the mean probability of sampling in a time bin as:

$${p}_{t},=,frac{1}{R}mathop{sum }limits_{i,=,R}^{R}1,-,{{{{{rm{exp }}}}}}(-{q}_{t},,times, {g}_{i},times, {delta }_{t})$$

(2)

We note that while q_t quantifies the mean preservation rate in PyRate (i.e. averaged among taxa in a time bin t), the mean sampling probability p_t will be lower than (1,-,{{{{{rm{exp }}}}}}({-q}_{t},times, {delta }_{t})) (i.e. the probability expected under a constant preservation rate equal to q_t) especially for high levels of rate heterogeneity, due to the asymmetry of the gamma distribution and the non-linear relationship between rates and probabilities. We sample the corrected diversity from its posterior through MCMC. The likelihood of the sampled number of taxa is computed as the probability mass function of a binomial distribution with D_i as the ‘number of trials’ and _pi as the ‘success probability’. To account for the expected temporal autocorrelation of a diversity trajectory⁸⁷, we use a Brownian process as a prior on the log-transformed diversity trajectory through time. Under this model, the prior probability of D_t is:

$$Pleft({{{{{rm{log }}}}}}left({D}_{t}right)right),{{{{{mathscr{ sim }}}}}},{{{{{mathscr{N}}}}}}({{{{{rm{log }}}}}}left({D}_{t,-,1}right),,sqrt{{sigma }^{2},,times, ,{delta }_{t}})$$

(3)

where σ² is the variance of the Brownian process. For the first time bin in the series, D_t = 0, we use a vague prior ({{{{{mathscr{U}}}}}}(0,infty )). Because the variance of the process is itself unknown and may vary among clades as a function of their diversification history, we assign it an exponential hyperprior Exp(1) and estimate it using MCMC. Thus, the full posterior of the mcmcDivE model is:

$$underbrace{P(D,{sigma }^{2}|x,q,alpha )}_{{{{{rm{posterior}}}}}}propto underbrace{P(x|D,q,alpha )}_{{{{{rm{likelihood}}}}}}times underbrace{P(D|{sigma }^{2})}_{{{{{rm{prior}}}}}}times underbrace{P({sigma }^{2})}_{{{{{rm{hyperprior}}}}}}$$

(4)

where ({{{{{bf{D}}}}}},=,{{D}_{0},{D}_{1},ldots ,{D}_{t}}) and ({{{{{bf{q}}}}}},=,{{q}_{0},{q}_{1},ldots ,{q}_{t}}) are vectors of estimated diversity, sampled diversity, and preservation rates for each of T time bins. We estimate the parameters D and σ² using MCMC to obtain samples from their joint posterior distribution. To incorporate uncertainties in q and α we randomly resample them during the MCMC from their posterior distributions obtained from PyRate analyses of the fossil occurrence data. While in mcmcDivE we use a posterior sample of q_t and α precomputed in PyRate for computational tractability of the problem, a joint estimation of all PyRate and mcmcDivE parameters is in principle possible, particularly for smaller datasets. mcmcDivE is implemented in Python v.3 and is available as part of the PyRate software package.

Simulated and empirical diversity analyses

We assessed the performance of the mcmcDivE method using 600 simulated datasets obtained under different birth-death processes and preservation scenarios. The settings of the six simulations (A–F) are summarised in Table S65 and we simulated 100 datasets from each setting. Since the birth-death process is stochastic and can generate a wide range of outcomes, we only accepted simulations with 100 to 500 species, although the resulting number of sampled species decreased after simulating the preservation process. From each birth-death simulation we sampled fossil occurrences based on a heterogeneous preservation process. Each simulation included six different preservation rates which were drawn randomly within the boundaries 0.25 and 2.5, with rate shifts set to 23, 15, 8, 5.3 and 2.6 Ma. To ensure that most rates were small (i.e. reflecting poor sampling), we randomly sampled preservation rates as:

$$q, sim ,exp left({{{{{mathscr{U}}}}}}left(log left(0.25right),,log left(2.5right)right)right)$$

(5)

In two of the five scenarios (D, F), we included strong rate heterogeneity across lineages (additionally to the rate variation through time), by assuming that preservation rates followed a gamma distribution with shape and rate parameters set to 0.5. This indicates that if the mean preservation rate in a time bin was 1, the preservation rate varied across lineages between <0.001 and 5 (95% interval). In one scenario (B), we set the preservation rate to 0 (complete gap in preservation) in addition to the temporal rate changes used in the other scenarios. Specifically, the preservation rate was set to 0 in two time intervals between 15 and 8 Ma and between 5.3 and 2.6 Ma.

We analyzed the occurrence data using PyRate to estimate preservation rates through time and infer the amount of rate heterogeneity across lineages. We ran 10 million MCMC generations using the TPP preservation rate model with gamma-distributed heterogeneity. We then ran mcmcDivE for 200,000 MCMC iterations assuming bins of 1-myr duration to estimate corrected diversity trajectories while resampling the posterior distributions of the preservation parameters inferred by PyRate. To summarise the performance of mcmcDivE we quantified the mean absolute percentage error computed as the absolute difference between true and estimated diversity averaged across all time bins and divided by the mean true diversity-through time, then used a one-tailed t-test to determine whether the mean absolute percentage error for the mcmcDivE estimate is significantly smaller than those for the other diversity estimation methods in each set of 100 simulations. We additionally computed the coefficient of determination (R²) between estimated and true diversity to assess how closely the estimated trends matched the true diversity trajectories. We compared the performance of the mcmcDivE estimates with a curve of raw sampled diversity (i.e. number of sampled species per 1 Myr time-bin), a range-through diversity trajectory based on first and last appearances of sampled species, and sampling-corrected trajectories estimated using coverage-based rarefaction (estimateD function in the iNEXT R package⁷⁰) and the squares extrapolator⁸⁸.

From our simulated results, we find that mcmcDivE provides accurate results under most settings and significantly better estimates (significantly smaller mean absolute percentage error; p < 0.0001 for all six sets of simulations) of the diversity-through time compared with raw diversity curves, range-through diversity trajectories or sampling-corrected estimates from coverage-based rarefaction, or extrapolation by squares (Fig. 9). The mean absolute percentage error averaged 0.13 (95% CI: 0.04–0.29) in simulations without across lineage rate heterogeneity (Fig. 9E), with a high correlation with the true diversity trajectory: R² = 0.93 (95% CI: 0.72–0.99). The diversity estimates remained accurate even in the presence of time intervals with zero preservation (Fig. 9B).

Fig. 9: Validation of mcmcDivE accuracy using six different birth-death preservation simulations.

Plots in the left column show an example of diversity trajectories for one simulation: the true diversity is indicated with the dashed line, green line and shaded areas represent the mcmcDivE estimates and 95% confidence intervals. Purple, orange and red lines show the range-through, raw diversity and squares-extrapolated trajectories, respectively. Blue lines and shaded areas show diversity from coverage-based rarefaction/SQS and 95% confidence intervals. Violin plots show the distributions (range, 1st and 3rd quartiles and median) of absolute percentage errors and coefficient of determination between true and estimated diversity calculated from 100 simulations in each setting. The scenarios (A–F) refer to different birth-death and preservation settings as described in Table S65. Source data are provided as a Source Data file.

Full size image

Simulations with rate heterogeneity across lineages (Fig. 9D, F) yielded higher mean absolute percentage errors (0.43, 95% CI: 0.24–0.55) while maintaining a strong correlation with the true diversity trajectory R² = 0.95 (95% CI: 0.85–0.99). This indicates that, while the absolute estimates of diversity are on average less accurate in the presence of strong rate heterogeneity across lineages (in addition to strong rate variation through time), the relative changes in diversity-through time are still accurately estimated. The increased relative error in these simulations is mostly linked with an underestimation of diversity throughout, which has been observed in other probabilistic methods to infer diversity in the presence of rate heterogeneity across lineages (Close et al.¹⁴). This, however, does not hamper the robust estimation of relative diversity trends using our method (Fig. 9D, F).

After validating the accuracy of the model, regional analyses of Triassic marine diversity were run for 1000,000 MCMC iterations at 1-myr intervals. We summarised the diversity estimates by calculating the median of the posterior samples and the 95% credible intervals.

Turnover estimation

Counts of unique taxa within a sample (geographic area or time bin) are a measure of diversity while the degree of taxonomic differentiation between two samples constitutes a measure of turnover. Taxonomic turnover through time, measured by successive comparison of the taxon pools in adjacent time bins, avoids the pitfall of cryptic turnover hidden within diversity or diversification rate curves as high extinction and origination rates will strongly increase taxonomic differentiation through time. We use the modified Forbes index (Forbes*)⁸⁹ with relative abundance correction (RAC)⁹⁰ as this combination of methods robustly accounts for both incomplete sampling in each sample and differing abundance distributions between a pair of samples, both of which can bias the apparent degree of similarity⁹⁰. RAC is a potentially computationally expensive procedure as it multiplies the number of null trials by the number of rounds of sampling standardisation applied per trial, and because comparison of multiple samples to return a distance matrix becomes exponentially more expensive with each added sample. To address this issue, we implement the RAC-adjustted Forbes* metric (converted to dissimilarity as 1 − Forbes*) using an efficient, parallelised C++ function with an Rcpp wrapper in R. We anticipate that our implementation, which performs orders of magnitude faster than the original version, will ease uptake of this method by other palaeobiologists. Occurrences in each region were first binned at stage level, then with twofold subdivision of the Anisian, Ladinian and Carnian and threefold subdivision of the Norian, using the occurrence midpoint ages. RAC-Forbes* dissimilarity was then calculated for each region between successive pairs of time bins with 100 null trials, and 100 sampling standardisation trials at a sampling quorum of 0.5 for each null trial and empirical estimate.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.