Mapping peat thickness and carbon stocks of the central Congo Basin using field data

Field-data collection

Fieldwork was conducted in DRC between January 2018 and March 2020. Ten transects (4–11 km long) were installed, identical to the approach in ref. ⁹, in locations that were highly likely to be peatland. These were selected to help test hypotheses about the role of vegetation, surface wetness, nutrient status and topography in peat accumulation (Fig. 1a and Supplementary Table 1). A further eight transects (0.5–3 km long) were installed to assess our peat mapping capabilities (Fig. 1a and Supplementary Table 1).

Every 250 m along each transect, land cover was classified as one of six classes: water, savannah, terra firme forest, non-peat-forming seasonally inundated forest, hardwood-dominated peat swamp forest or palm-dominated peat swamp forest. Peat swamp forest was classified as palm dominated when >50% of the canopy, estimated by eye, was palms (commonly Raphia laurentii or Raphia sese). In addition, several ground-truth points were collected at locations in the vicinity of each transect from the clearly identifiable land-cover classes water, savannah and terra firme forest.

Peat presence/absence was recorded every 250 m along all transects, and peat thickness (if present) was measured by inserting metal poles into the ground until the poles were prevented from going any further by the underlying mineral layer, identical to the pole method of ref. ⁹. In addition, a core of the full peat profile was extracted every kilometre along the ten hypothesis-testing transects, if peat was present, with a Russian-type corer (52 mm stainless steel Eijkelkamp model); these 63 cores were sealed in plastic for laboratory analysis.

Peat-thickness laboratory measurements

Peat was defined as having an organic matter (OM) content of ≥65% and a thickness of ≥0.3 m (sensu ref. ⁹). Therefore, down-core OM content of all 63 cores was analysed to measure peat thickness. The organic matter content of each 0.1-m-thick peat sample was estimated via loss on ignition (LOI), whereby samples were heated at 550 °C for 4 h. The mass fraction lost after heating was used as an estimate of total OM content (% of mass). Peat thickness was defined as the deepest 0.1 m with OM ≥ 65%, after which there is a transition to mineral soil. Samples below this depth were excluded from further analysis. Rare mineral intrusions into the peat layer above this depth, where OM < 65% for a sample within the peat column, were retained for further analysis. In total, 59 out of 63 collected cores had LOI-verified peat thickness ≥0.3 m.

The pole method used to estimate peat thickness in the field was calibrated against LOI-verified measurements by fitting a linear regression model between all LOI-verified and pole-method peat-thickness measurements sampled at the same location (93 sites across ROC and DRC, including 37 from ref. ⁹). Three measurements from DRC with a Cook’s distance >4× the mean Cook’s distance were excluded as influential outliers. Mean pole-method offset was significantly higher along the DRC transects (0.94 m) than along those in ROC (0.48 m; P < 0.001) due to the presence of softer alluvium substrate in river-influenced sites in DRC. We therefore added this grouping as a categorical variable to the regression. The resulting model (adjusted R² = 0.95; P < 0.001; Extended Data Fig. 3) was used to correct all pole-method measurements in each group for which no LOI-verified thickness was available: corrected peat thickness = −0.1760 + 0.8626 × (pole-method thickness) – 0.3284 × (country), with country dummy coded as ROC (0) and DRC (1).

Carbon-density estimates

To calculate carbon density (mass per unit area), estimates of carbon storage in each 0.1-m-thick peat sample (thickness × bulk density × carbon concentration) were summed to provide an estimate of total carbon density per core (MgC ha⁻¹), identical to ref. ⁹. We estimated carbon density for 80 peat cores (OM ≥ 65%, thickness ≥ 0.3 m), located every other kilometre along 18 transects, including 37 cores from the ten transects used for hypothesis testing in DRC and 43 cores from eight transects in ROC⁹.

Peat thickness of the 80 cores was obtained by laboratory LOI. To estimate peat bulk density, every other 0.1 m down-core, samples of a known peat volume were weighed after being dried for 24 h at 105 °C (n = 906). Bulk density (g cm⁻³) was then calculated by dividing the dry sample mass (g) by the volume of the sample taken from the peat corer dimensions (cm³). Within each core, linear interpolation was used to estimate bulk density for the alternate 0.1-m-thick samples of the core that were not measured.

For total carbon concentration (%), only the deepest core per transect, plus additional deep cores from the Lokolama transect (1) in DRC and Ekolongouma transect (3) in ROC (22 in total, 11 from DRC and 11 from ROC⁹) were sampled down-core. Every other 0.1-m-thick sample was measured using an elemental analyser (Elementar Vario MICRO Cube with thermal conductivity detection for all cores, except those from Boboka, Lobaka and Ipombo transects, which were analysed using Sercon ANCA GSL with isotope-ratio mass spectrometer detection, due to COVID-19 disruption). All samples (n = 422) were pre-dried for 48 h at 40 °C and ground to <100 μm using an MM301 mixer mill. Again, linear interpolation was used within each core for the alternate samples that were not measured.

The remaining 58 cores had less-intensive carbon concentration sampling. We therefore interpolated the carbon concentration for each 0.1-m-thick sample because well-sampled cores show a consistent pattern with depth: an increase to a depth of about 0.5 m followed by a long, very weak decline and finally a strong decline over the deepest approximately 0.5 m of the core⁹. We used segmented regression on the 22 well-sampled cores (segmented package in R, version 1.3–1) to parameterize the three sections of the core, using the means of these relationships to interpolate carbon concentrations for the remaining 58 cores, following ref. ⁹.

To estimate carbon density from modelled peat thickness across the basin, we developed a regression model between peat thickness and per-unit-area carbon density using the 80 sampled cores. We compared linear regressions for normal, logarithmic- and square-root-transformed peat thickness, selecting the model with the lowest corrected Akaike information criterion (AICc) and highest R². A linear model with square-root-transformed peat thickness was found to provide the best fit (R² = 0.86; P < 0.001; Extended Data Fig. 7). Bootstrapping was applied (boot package in R, version 1.3–25) to assess uncertainty around the regression.

Modelling peatland extent

Satellites cannot detect peat directly. We therefore mapped vegetation and used field-based associations between peat and vegetation to infer peat presence^9,29. Five land-cover classes were used for the purpose of peatland mapping: water, savannah, palm-dominated peat swamp forest, hardwood-dominated peat swamp forest and non-peat-forming forest. In this classification, field recordings of non-peat-forming seasonally inundated forest (<30 cm thickness of ≥65% OM) were grouped with field recordings of terra firme forest, which also does not form peat, to form the non-peat-forming forest class. Our field recordings of hardwood- or palm-dominated peat swamp forest, by definition, consist of all forest sites that form peat, including any seasonally inundated forest that forms peat (≥30 cm of ≥65% OM).

A total of 1,736 ground-truth data points were used: 172 in water, 476 in savannah, 632 in non-peat-forming forest (97 non-peat-forming seasonally inundated forest and 535 terra firme forest), 188 in palm-dominated peat swamp forest and 268 in hardwood-dominated peat swamp forest (Extended Data Fig. 1). These data come from eight sources (Supplementary Table 2): first, ground-truth locations collected for this study using a GPS (Garmin GPSMAP 64 s) at all transect sites in DRC for which a land-cover class was determined (382 points); second, published ground-truth data from nine transects in ROC (292 points)⁹; third, 299 GPS locations of known savannah and terra firme forest land-cover classes from archaeological research databases across the basin^30,31; fourth, 191 GPS locations from permanent long-term forest inventory plots of the African Tropical Rainforest Observation Network, mostly from terra firme forest³², retrieved from the ForestPlots database^33,34; fifth, 229 GPS data points from terra firme forest or savannah locations in and around Lomami National Park (R. Batumike, G. Imani and A. Cuní-Sanchez, personal communication); sixth, 24 published savannah data points in and around Lomami NP³⁵; seventh, 23 published locations of savannah, terra firme forest and palm- or hardwood-dominated peat swamp forest in DRC¹¹; eighth, 296 data points from Google Earth for unambiguous savannah and water sites (middle of lakes or rivers) distributed across the region.

We used nine remote-sensing products to map peat-associated vegetation (Supplementary Fig. 1). Eight of these are identical to those used by ref. ⁹: three optical products (Landsat 7 ETM + bands 5 (SWIR 1), 4 (NIR) and 3 (Red)); three L-band synthetic-aperture radar products (ALOS PALSAR HV, HH and HV/HH); and two topographic products (SRTM DEM (digital elevation model) void-filled with ASTER GDEM v.2 data and slope; acquisition date 2000). To this, we added a HAND-index (height above nearest drainage point), which significantly improved model performances (median MCC 79.7%, compared with 77.8% or 75.6% for just DEM or HAND alone, respectively; P < 0.001).

HAND was derived from the SRTM DEM with the algorithm from ref. ³⁶, using the HydroSHEDS global river network at 15 s resolution as reference product³⁷. Alternative NASADEM- or MERIT DEM-derived^38,39,40 combinations of DEM, HAND and slope were tested with an initial subset of data in R, while keeping all other remote-sensing products the same (median MCC: 79.0% and 75.1%, respectively), but did not significantly improve model performance compared with SRTM-derived products (80.9% median MCC; P < 0.001).

The Landsat bands are pre-processed, seamless cloud-free mosaics for ROC (composite of three years, 2000, 2005 and 2010) and DRC (composite of six years, 2005–2010)⁴¹. These mosaics performed better than more recent basin-wide automated cloud-free Sentinel-2 mosaics that we developed (bands 5, 8A and 11; composite of five years, 2016–2020), probably because they contain less directional reflectance artefacts (the median MCC of 80.9% for the pre-processed Landsat mosaics is significantly higher than the 78.1% for the Sentinel-2 mosaics, P < 0.005).

The ALOS PALSAR radar bands are mosaics of mean values of annual JAXA composites for the years 2007–2010 (ref. ⁹). More recent radar data (ALOS-2 PALSAR-2 HV, HH, HV/HH; 2015–2017) did not significantly improve model performances (median MCC 80.9% and 80.6%, respectively; P < 0.01). All remote-sensing products were resized to a common 50 m grid using a cubic convolution resampling method.

We then tested which classification algorithm to use, as more sophisticated algorithms might improve overall accuracy against our training dataset but might also reduce regional accuracy of the map in areas far from test data, critical in this case given large areas of the central Congo peatland region remain unsampled.

Three supervised classification algorithms were tested in order of increasing complexity: ML, support-vector machine (SVM) and RF. We assessed each classifier using both a random and spatial cross-validation (CV) approach^42,43,44. Random CV was implemented using stratified two-thirds Monte Carlo selection, whereby 1,000 times, we randomly selected two-thirds of all data points per class as training data, to be evaluated against the remaining one-third per class as testing data.

Spatial CV was implemented by grouping all transects data points in four distinct hydro-geomorphological regions: (1) transects perpendicular to the black-water Likouala-aux-Herbes River (n = 179 data points); (2) transects perpendicular to the white-water Ubangi River (n = 113); (3) transects perpendicular to the Congo River, intermediate between black and white water (n = 123); and (4) transects perpendicular to the black-water Ruki, Busira and Ikelemba rivers, plus other nearby transects (collectively named the Ruki group; n = 258). To each group we added ground-truth data points from other non-transect data sources (Supplementary Table 2) that belonged to the same map regions (n = 82, 27, 20 and 113, respectively). We then tested 1,000 times how well each classifier performs in each of the four regions when trained only on a stratified two-thirds Monte Carlo selection of the remaining data points (data points from the three other regional transect groups) plus ground-truth data points not associated with or near any transect group (n = 821; for example, the savannah and terra firme forest data points in Lomami National Park in DRC, which are far (> 300 km) from any transect group).

Model performance was based on MCC for binary peat/non-peat predictions (hardwood- and palm-dominated peat swamp forest classes combined into one peat class; water, savannah and non-peat-forming forest combined into one non-peat class). We compared MCC, rather than popular metrics such as Cohen’s kappa, F1 score or accuracy, because it is thought to be the most reliable evaluation metric for binary classifications^45,46. We also computed BA from random CV to compare with the first-generation map. While less robust than MCC, BA is independent of imbalances in the prevalence of positives/negatives in the data, thus allowing better comparison between classifiers trained on different datasets¹⁶. The best estimate of each accuracy metric or area estimate per model or region is the median value of 1,000 runs, alongside a 95% CI.

In the case of SVM and RF, random CV models were implemented in Google Earth Engine (GEE)⁴⁷ using all nine remote-sensing products. However, because ML is currently not supported by GEE, random CV with this algorithm was implemented in IDL-ENVI software (version 8.7–5.5), using a principal component analysis to reduce the nine remote-sensing products to six uncorrelated principal components to reduce computation time. All spatial CV models were implemented in R (superClass function from the RStoolbox package, version 0.2.6), with principal component analysis also applied in the case of ML only. All RF models were trained using 500 trees, with three input products used at each split in the forest (the default, the square root of the number of variables). All SVM models were implemented with a radial basis function kernel, with all other parameters set to default values.

Comparison of the ML, SVM and RF models with the model performance of ref. ⁹, using balanced accuracy from random CV, shows improved results only in the case of the ML classifier (Supplementary Table 3). Comparing MCC using the spatial CV approach, we found that the ML algorithm is also most transferable to regions for which we lack training data. While RF gives slightly better MCC with random CV, when no regions are omitted, spatial CV shows particularly poor predictive performance of this algorithm for the Congo and Ruki regions when trained on data from the other regions. SVM has the lowest MCC of all three classifiers with random CV and performs worst of all three in the Congo region with spatial CV.

In addition, applying spatial CV to the largely interfluvial basin region (ROC transects; n = 401) and the largely river-influenced region (DRC transects; n = 540) also shows RF performs poorly (Supplementary Table 3). This further supports selecting the ML algorithm to produce our second-generation peat-extent map of the central Congo peatlands. The final peatland-extent estimate is then obtained as the median value (alongside 95% CI) of the combined hardwood- and palm-dominated peat swamp forest extent from 1,000 ML runs, each time trained with two-thirds of the ground-truth data.

Modelling peat thickness

A map of distance from the peatland margins was developed in GEE using the median ML peat probability map, the ML map with a 50% peat probability threshold (>500 hardwood- or palm-dominated peat swamp predictions out of 1,000 runs). For each peat pixel in this binary classification, a cost function was used to calculate the Euclidean distance to the nearest non-peat pixel after speckle and noise were removed using a 5 × 5 squared-kernel majority filter. Using this distance map, transects were found to have markedly different relationships between peat thickness and distance from the peatland margin, that is, different slopes (n = 18; P < 0.001; Extended Data Fig. 4). The modest linear fit (R² = 41.0%; RMSE = 1.21 m) cautions against a uniform regression between peat thickness and distance from the margin across the basin.

Instead, we developed a spatially explicit RF regression model to predict peat thickness, derived from 14 remotely sensed potential covariates that may explain variation in peat thickness. These 14 variables included the nine optical, radar and topographic products used in the peatland-extent analysis, as well as distance from the peatland margin, distance from the nearest drainage point (same reference network as for HAND)³⁷, precipitation seasonality⁴⁸, climatic water balance (mean annual precipitation⁴⁸ minus mean annual potential evapotranspiration⁴⁹) and live woody above-ground biomass⁵⁰. Ten of these variables were found to be significantly correlated with peat thickness (Kendall’s τ, P < 0.01): all three optical bands, all three radar bands, distance from the peatland margin, distance from the nearest drainage point, precipitation seasonality and climatic water balance. Applying stepwise backwards selection, we tested combinations of these ten predictors by each time dropping one predictor out of the model in order from low to high variable importance, selecting as the best model the one with highest median R² and lowest median RMSE obtained from 100 random (two-thirds) CVs. The importance of each variable was assessed by calculating mean decrease impurity, the total decrease in the residual sum of squares of the regression after splitting on that variable, averaged over all decision trees in the RF. Median mean decrease impurity was calculated for each variable on the basis of 100 random (two-thirds) CVs of the overall model containing all ten significant predictors.

The best model contained four predictors: distance from the peatland margin, distance to the nearest drainage point, climatic water balance (all positively correlated with peat thickness; Kendall’s τ coefficient = 0.49, 0.15 and 0.13, respectively; P < 0.001 for all) and precipitation seasonality (negatively correlated with thickness; Kendall’s τ = −0.11; P < 0.01); see Extended Data Fig. 6 for their spatial variability.

The RF regression was implemented in GEE with 500 trees and all other parameters set to default values. Predictor variables were resampled to 50 m resolution. As training data, we included all LOI-verified and corrected pole-method thickness measurements that fell within the masked map of >50% peat probability (n = 463), including thickness >0 and <0.3 m from non-peat sites that could improve predictions of shallow peat deposits near the margins (n = 12).

Our final RF model (R² = 93.4%;, RMSE = 0.42 m) had consistently smaller residuals compared with a multiple linear regression model containing the same four predictors with interaction effects (adjusted R² = 73.6%;, RMSE = 0.80 m; Extended Data Fig. 5). It also performed better when testing out-of-sample performance, using 100 random two-thirds CVs of training data (median R² = 82.2%, RMSE = 0.68 m and median adjusted R² = 73.6%, RMSE = 0.85 m for RF model and multiple linear regression, respectively).

For uncertainty on our thickness predictions, we first estimated area uncertainty by creating 100 different maps of distance from the peat margin by randomly selecting (with replacement) a minimum peat probability threshold >0% and <100%, removing speckle and noise, and re-calculating the closest distance to the nearest non-peat pixel. We then combined the 100 distance maps each time with the three other selected predictors (precipitation seasonality, climatic water balance and distance from nearest drainage point) as input in an RF model to develop 100 different peat-thickness maps. For these model runs, we included all available thickness measurements (>0 m) that fell within each specific distance map. Each output map was masked to an area ≥0.3 m thickness, consistent with our peat definition. A map of median peat thickness (Fig. 4a) and relative uncertainty (± half the width of the 95% CI as percentage of the median; Fig. 4b) was then calculated for each pixel on the basis of the 100 available thickness estimates.

Carbon-stock estimates

We mapped carbon density across the central Congo Basin in GEE by applying 20 bootstrapped thickness–carbon regressions that were normally distributed around the best fit (Extended Data Fig. 7) to the 100 peat-thickness maps from the RF regression model, generating a map of median carbon density out of 2,000 estimates (Fig. 4a), together with relative uncertainty (± half the width of the 95% CI as percentage of the median; Fig. 4b).

Total peat carbon stocks were computed in GEE by summing carbon density (Mg ha⁻¹) over all 50 m grid squares defined as peat. To assess uncertainty around this estimate, we again combined the 100 peat-thickness maps (uncertainty from area and thickness) with 20 bootstrapped thickness–carbon regressions (uncertainty from carbon density, including bulk density and carbon concentration). We thus obtained 2,000 peat carbon-stock estimates for the total central Congo Basin peatland complex, which were used to estimate the mean, median and 95% CI (Extended Data Fig. 8a).

Regional carbon-stock estimates were similarly obtained for each sub-national administrative region (departments in ROC and provinces in DRC; Extended Data Fig. 2), as well as national-level protected areas (national parks and nature/biosphere/community reserves)⁵¹ and logging^52,53, mining^54,55 and palm oil^56,57,58 concessions (Extended Data Figs. 9 and 10). As hydrocarbon concessions cover almost the whole peatlands area^24,26, they cover almost 100% of the central Congo peat carbon stocks.

Sensitivity analysis was performed by bootstrapping the area, thickness or carbon-density component, while keeping the others constant (Extended Data Fig. 8b). For area, we bootstrapped 100 randomly selected peatland area estimates; for thickness, 100 randomly selected two-thirds subsets of all thickness measurements; for carbon density, 20 normally distributed regression equations from the bootstrapped thickness–carbon relationship.

Mapping peat thickness and carbon stocks of the central Congo Basin using field data

Field-data collection

Peat-thickness laboratory measurements

Carbon-density estimates

Modelling peatland extent

Modelling peat thickness

Carbon-stock estimates

Seed choice in ground beetles is driven by surface-derived hydrocarbons

Farm size affects the use of agroecological practices on organic farms in the United States

ITALIAN LANGUAGE

ENGLISH LANGUAGE