Globally invariant metabolism but density-diversity mismatch in springtails
Data reportingThe data underpinning this study is a compilation of existing datasets and therefore, no statistical methods were used to predetermine sample size, the experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. The measurements were taken from distinct samples, repeated measurements from the same sites were averaged in the main analysis.Inclusion & ethicsData were primarily collected from individual archives of contributing co-authors. The data collection initiative was openly announced via the mailing list of the 10th International Seminar on Apterygota and via social media (Twitter, Researchgate). In addition, colleagues from less explored regions (Africa, South America) were contacted via personal networks of the initial authors group and literature search. All direct data providers who collected and standardised the data were invited as co-authors with defined minimum role (data provision and cleaning, manuscript editing and approval). For unpublished data, people who were directly involved in sorting and identification of springtails, including all local researchers, were invited as co-authors. Principal investigators were normally not included as co-authors, unless they contributed to conceptualisation and writing of the manuscript. All co-authors were informed and invited to contribute throughout the research process—from the study design and analysis to writing and editing. The study provided an inclusive platform for researchers around the globe to network, share and test their research ideas.Data acquisitionBoth published and unpublished data were collected, using raw data whenever possible entered into a common template. In addition, data available from Edaphobase47 was included. The following minimum set of variables was collected: collectors, collection method (including sampling area and depth), extraction method, identification precision and resources, collection date, latitude and longitude, vegetation type (generalized as grassland, scrub, woodland, agriculture and other for the analysis), and abundances of springtail taxa found in each soil sample (or sampling site). Underrepresented geographical areas (Africa, South America, Australia and Southeast Asia) were specifically targeted by a literature search in the Web of Science database using the keywords ‘springtail’ or ‘Collembola’, ‘density’ or ‘abundance’ or ‘diversity’, and the region of interest; data were acquired from all found papers if the minimum information listed above was provided. All collected datasets were cleaned using OpenRefine v3.3 (https://openrefine.org) to remove inconsistencies and typos. Geographical coordinates were checked by comparing the dataset descriptions with the geographical coordinates. In total, 363 datasets comprising 2783 sites were collected and collated into a single dataset (Supplementary Fig. 1).Calculation of community parametersCommunity parameters were calculated at the site level. Here, we defined a site as a locality that hosts a defined springtail community, is covered by a certain vegetation type, with a certain management, and is usually represented by a sampling area of up to a hundred metres in diameter, making species co-occurrence and interactions plausible. To calculate density, numerical abundance in all samples was averaged and recalculated per square metre using the sampling area. Springtail communities were assessed predominantly during active vegetation periods (i.e., spring, summer and autumn in temperate and boreal biomes, and summer in polar biomes). Our estimations of community parameters therefore refer to the most favourable conditions (peak yearly densities). This seasonal sampling bias is likely to have little effect on our conclusions, since most springtails survive during cold periods38,48. Finally, we used mean annual soil temperatures49 to estimate the seasonal mean community metabolism (described below) and tested for the seasonal bias in additional analysis (see Linear mixed-effects models).All data analyses were conducted in R v. 4.0.250 with RStudio interface v. 1.4.1103 (RStudio, PBC). Data was transformed and visualised with tidyverse packages51,52, unless otherwise mentioned. Background for the global maps was acquired via the maps package53,54. To calculate local species richness, we used data identified to species or morphospecies level (validated by the expert team). Since the sampling effort varied among studies, we extrapolated species richness using rarefaction curves based on individual samples with the Chao estimator51,52 in the vegan package53. For some sites, sample-level data were not available in the original publications, but site-level averages were provided, and an extensive sampling effort was made. In such cases, we predicted extrapolated species richness based on the completeness (ratio of observed to extrapolated richness) recorded at sites where sample-level data were available (only sites with 5 or more samples were used for the prediction). We built a binomial model to predict completeness in sites where no sample-level data were available using latitude and the number of samples taken at a site as predictors: glm(Completeness~N_samples*Latitude). We found a positive effect of the number of samples (Chisq = 1.97, p = 0.0492) and latitude (Chisq = 2.07, p = 0.0391) on the completeness (Supplementary Figs. 17–19). We further used this model to predict extrapolated species richness on the sites with pooled data (435 sites in Europe, 15 in Australia, 6 in South America, 4 in Asia, and 3 in Africa).To calculate biomass, we first cross-checked all taxonomic names with the collembola.org checklist55 using fuzzy matching algorithms (fuzzyjoin R package56) to align taxonomic names and correct typos. Then we merged taxonomic names with a dataset on body lengths compiled from the BETSI database57, a personal database of Matty P. Berg, and additional expert contributions. We used average body lengths for the genus level (body size data on 432 genera) since data at the species level were not available for many morphospecies (especially in tropical regions), and species within most springtail genera had similar body size ranges. Data with no genus-level identifications were excluded from the analysis. Dry and fresh body masses were calculated from body length using a set of group-specific length-mass regressions (Supplementary Table 1)58,59 and the results of different regressions applied to the same morphogroup were averaged. Dry mass was recalculated to fresh mass using corresponding group-specific coefficients58. We used fresh mass to calculate individual metabolic rates60 and account for the mean annual topsoil (0–5 cm) temperature at a given site61. Group-specific metabolic coefficients for insects (including springtails) were used for the calculation: normalization factor (i0) ln(21.972) [J h−1], allometric exponent (a) 0.759, and activation energy (E) 0.657 [eV]60. Community-weighted (specimen-based) mean individual dry masses and metabolic rates were calculated for each sample and then averaged by site after excluding 10% of maximum and 10% of minimum values to reduce impact of outliers. To calculate site-level biomass and community metabolism, we summed masses or metabolic rates of individuals, averaged them across samples, and recalculated them per unit area (m2).Parameter uncertaintiesOur biomass and community metabolism approximations contain several assumptions. To account for the uncertainty in the length-mass and mass-metabolism regression coefficients, in addition to the average coefficients, we also used maximum (average + standard error) and minimum coefficients (average—standard error; Supplementary Table 1) in all equations to calculate maximum and minimum estimations of biomass and community metabolism reported in the main text. Further, we ignored latitudinal variation in body sizes within taxonomic groups62. Nevertheless, latitudinal differences in springtail density (30-fold), environmental temperature (from −16.0 to +27.6 °C in the air and from −10.2 to +30.4 °C in the soil), and genus-level community compositions (there are only few common genera among polar regions and the tropics)55 are higher than the uncertainties introduced by indirect parameter estimations, which allowed us to detect global trends. Although most springtails are concentrated in the litter and uppermost soil layers20, their vertical distribution depends on the particular ecosystem63. Since sampling methods are usually ecosystem-specific (i.e. sampling is done deeper in soils with developed organic layers), we treated the methods used by the original data collectors as representative of a given ecosystem. Under this assumption, we might have underestimated the number of springtails in soils with deep organic horizons, so our global estimates are conservative and we would expect true global density and biomass to be slightly higher. To minimize these effects, we excluded sites where the estimations were likely to be unreliable (see data selection below).Data selectionOnly data collection methods allowing for area-based recalculation (e.g. Tullgren or Berlese funnels) were used for analysis. Data from artificial habitats, coastal ecosystems, caves, canopies, snow surfaces, and strong experimental manipulations beyond the bounds of naturally occurring conditions were excluded (Supplementary Fig. 1). To ensure data quality, we performed a two-step quality check: technical selection and expert evaluation. Collected data varied according to collection protocols, such as sampling depth and the microhabitats (layers) considered. To technically exclude unreliable density estimations, we explored data with a number of diagnostic graphs (Supplementary Table 2; Supplementary Figs. 12–20) and filtered it, excluding the following: (1) All woodlands where only soil or only litter was considered; (2) All scrub ecosystems where only ground cover (litter or mosses) was considered; (3) Agricultural sites in temperate zones where only soil with sampling depth 90% of cases were masked on the main maps; for the map with density-species richness visualisation, two corresponding masks were applied (Fig. 2).To estimate spatial variability of our predictions while accounting for the spatial sampling bias in our data (Fig. 1a) we performed a spatially stratified bootstrapping procedure. We used the relative area of each IPBES79 region (i.e., Europe and Central Asia, Asia and the Pacific, Africa, and the Americas) to resample the original dataset, creating 100 bootstrap resamples. Each of these resamples was used to create a global map, which was then reduced to create mean, standard deviation, 95% confidence interval, and coefficient of variation maps (Supplementary Figs. 4–7).Global biomass, abundance, and community metabolism of springtails were estimated by summing predicted values for each 30 arcsec pixel10. Global community metabolism was recalculated from joule to mass carbon by assuming 1 kg fresh mass = 7 × 106 J80, an average water proportion in springtails of 70%58, and an average carbon concentration of 45% (calculated from 225 measurements across temperate forest ecosystems)81. We repeated the procedure of global extrapolation and prediction for biomass and community metabolism using minimum and maximum estimates of these parameters from regression coefficient uncertainties (see Parameter uncertainties).Path analysisTo reveal the predictors of springtail communities at the global scale, we performed a path analysis. After filtering the selected environmental variables (see above) according to their global availability and collinearity, 13 variables were used (Supplementary Fig. 9b): mean annual air temperature, mean annual precipitation (CHELSA database67), aridity (CGIAR database68), soil pH, sand and clay contents combined (sand and clay contents were co-linear in our dataset), soil organic carbon content (SoilGrids database73), NDVI (MODIS database72), human population density (GPWv4 database74), latitude, elevation69, and vegetation cover reported by the data providers following the habitat classification of European Environment Agency (woodland, scrub, agriculture, and grasslands; the latter were coded as the combination of woodland, scrub, and agriculture absent). Before running the analysis, we performed the Rosner’s generalized extreme Studentized deviate test in the EnvStats package82 to exclude extreme outliers and we z-standardized all variables (Supplementary R Code).Separate structural equation models were run to predict density, dry biomass, community metabolism, and local species richness in the lavaan package83. To account for the spatial clustering of our data in Europe, instead of running a model for the entire dataset, we divided the data by the IPBES79 geographical regions and selected a random subset of sites for Eurasia, such that only twice the number of sites were included in the model as the second-most represented region. We ran the path analysis 99 times for each community parameter with different Eurasian subsets (density had n = 723 per iteration, local species richness had n = 352, dry biomass had n = 568, and community metabolism had n = 533). We decided to keep the share of the Eurasian dataset larger than other regions to increase the number of sites per iteration and validity of the models. The Eurasian dataset also had the best data quality among all regions and a substantial reduction in datasets from Eurasia would result in a low weight for high-quality data. We additionally ran a set of models in which the Eurasian dataset was represented by the same number of sites as the second-most represented region, which yielded similar effect directions for all factors, but slightly higher variations and fewer consistently significant effects. In the paper, only the first version of analysis is presented. To illustrate the results, we averaged effect sizes for the paths across all iterations and presented the distribution of these effect sizes using mirrored Kernel density estimation (violin) plots. We marked and discussed effects that were significant at p More