Overview
This study establishes a framework for mapping carbon stocks at the level of individual trees at a sub-continental scale in semi-arid sub-Saharan Africa north of the Equator. We used satellite imagery from the early dry season (Extended Data Fig. 1). The deep learning method developed by a previous study1 allowed us to map billions of discrete tree crowns at the 50-cm scale from West Africa to the Red Sea. Then we used allometry to convert tree crown area into tree wood, foliage and root carbon for the 0–1,000 mm year−1 precipitation zone in which our allometry was collected (Extended Data Fig. 2). We introduce a viewer that enables the billions of trees to be viewed at different scales, with information on location, metadata of the Maxar satellite image used, tree crown area and the estimated wood, foliage and root carbon content based on our allometry (Fig. 4). We also make available our output data for the 1,000 mm year−1 precipitation zone southward to 9.5° N latitude with information on location, precipitation, metadata of the Maxar satellite image used, tree crown area, tree wood carbon, tree root carbon and tree leaf carbon.
Satellite imagery
We used 326,523 Maxar multispectral images from the QuickBird-2, GeoEye-1, WorldView-2 and WorldView-3 satellites collected from 2002 to 2020 from November to March from 9.5° N to 24° N latitude within Universal Transverse Mercator (UTM) zones 28–37 for Africa (Extended Data Table 1a). These images were obtained by NASA through the NextView License from the National Geospatial-Intelligence Agency. Data were assembled over several years with a focus on later years to achieve a relatively recent and complete wall-to-wall coverage.
When using satellite data from different satellites over several years, with varying sun–target–satellite angles, with varying radiometric calibration of satellite spectral bands and different atmospheric compositions through which the surface is imaged, there are two possibilities for using hundreds of thousands of satellite images together quantitatively. One approach, used extensively in NASA’s, NOAA’s and the European Space Agency’s Earth-viewing satellite programmes, is to quantitatively inter-calibrate radiometrically the satellite channels through time; correct these data for time-dependent atmospheric effects such as aerosols, clouds, haze, smoke, dust and other atmospheric constituent effects and then normalize the viewing perspective to the same sun–target–satellite angle38. Another approach is to use the satellite data as collected; assemble training data of trees viewed from different satellites under different sun–target–satellite angles, different times, different atmospheric conditions and use machine learning with high-performance computing to perform the tree mapping at the 50-cm scale. The key to successful machine learning is to account for all the sources of variation within the domain of study in the training data to ensure accurate identification of trees under all circumstances. We included trees viewed substantially off-nadir, trees collected under different aerosol optical thicknesses, trees collected under cirrus cloud conditions, trees viewed in the forward and backward scan directions, trees on sandy soils, trees on clay soils, trees on burn scars, trees in laterite areas and trees in riverine settings. Our training data were collected by one team member and are a carefully selected manual delineation of 89,899 individual trees under a range of atmospheric conditions, viewing perspectives and ecological settings.
All multispectral and panchromatic bands associated with our Maxar images were orthorectified to a common mapping basis. We next pan-sharpened all multispectral bands to the 0.5-m scale with the associated panchromatic band. The absolute locational uncertainty of pixels at the 0.5-m scale from orbit is approximately ±11 m, considering the root-mean-square location errors among the QuickBird-2, GeoEye-1, WorldView-2 and WorldView-3 satellites (Extended Data Table 1). We formed the normalized difference vegetation index (NDVI)39 from every image in the traditional way from the pan-sharpened red and near-infrared bands. We also associated the panchromatic band with the NDVI band and ensured that the panchromatic and NDVI bands were highly co-registered. The NDVI was used to distinguish tree crowns from non-vegetated background because the images were taken from a period when only woody plants were photosynthetically active in this area36. Our training data were labelled on images from the early dry season when only trees have green leaves. Because most semi-arid savannah trees continue to photosynthesize in the early dry season after herbaceous vegetation senesces, green leaf tree crowns are easily mapped because of their higher NDVI values than their senescent herbaceous vegetation surroundings. We substantiate this by analysis of 308 individual trees using NDVI time series with 4-m PlanetScope imagery that emphasized the importance of satellite data from the November, December and January early dry-season months (Extended Data Fig. 1).
We next formed our data into mosaics by applying a set of decision rules, resulting in a collection of 16 × 16-km tiles within each UTM zone from 9.5° N to 24° N latitude for Africa. The initial round of scoring considered percentage cloud cover, sun elevation angle and sensor off-nadir angle: preference was given to imagery that had lower cloud cover, then higher sun elevation angle and finally view angles closest to nadir. In the second round of scoring, selections were assigned priority to favour early dry-season months and off-nadir view angles: preference was given to imagery from November, December and January with off-nadir angle less than ±15°; second to imagery from November to January with off-nadir angle between ±15° and ±30°; third to imagery from February or March with off-nadir angle less than ±15°; and finally to imagery from February or March with off-nadir angle between ±15° and ±30°. Image mosaics were necessary to eliminate multiple counting of trees. We formed mosaics using 94,502 images for tree segmentation, with 94% of these being from November, December and January. Ninety percent of our selected mosaic imagery was within ±15° of nadir, 87% were acquired between 2010 and 2020 and 94% were from the early dry season (Extended Data Fig. 7). A summary of month, year, solar elevation and off-nadir angle by UTM zone can be found in Supplemental Information Fig. 1.
Possible obscuration of the surface by clouds totalled 4.1% of our input mosaic data area and aerosol optical depth >0.6 at 470-nm (ref. 40) areas totalled 3.4% of our input data. However, we mapped 691,477,772 trees in our possible cloud-cover-affected and aerosol-affected areas, indicating that cloud and aerosol effects were lower than these numbers. In addition, 0.9% of our input data did not process. We include a data layer in our viewer for these three conditions.
Mapping tree crowns with deep learning
We used convolutional neural network models developed by a previous study1. The models were trained with manually delineated and annotated 89,899 individual trees along a north–south gradient from 0 to 1,000 mm year−1 rainfall1. Only features that showed a distinct crown area and associated shadow were included, which excluded small bushes, grass tussocks, rocks and other features that might have green leaves or cast a shadow from our classification. All training data and model training was done in UTM zones 28 and 29. Because tree floristic diversity in the 0–1,000 mm year−1 zone of our study is highly similar from the Atlantic Ocean to the Red Sea across Africa41,42,43, we added no further training data as our study moved further eastward. We used state-of-the-art deep learning to segment trees crowns at the 50-cm scale1. We used two different models based on a U-Net architecture, one for lower-rainfall desert regions with <150 mm year−1 precipitation and one for regions with average annual precipitation >150 mm year−1. Details about the network architecture, training process and hyperparameter choices can be found in ref. 1. Previous evaluation showed that early dry-season images performed better than late dry-season images, which was a limitation of our previous study. We reduced this error by using early dry-season images with only 6% of our area being covered by images from February and March. The models were also designed to separate clumped trees by highlighting spaces between different crowns during the learning process, similar to a strategy for separating touching cells in microscopic imagery22.
Allometry
Very-high-resolution satellite images and deep learning have achieved mapping of individual trees over large areas1. Each tree is georeferenced in the satellite data and defined by crown area. The challenge was to develop allometric equations for foliage, wood and root dry masses or carbon based on crown area regardless of species. This was met by reanalysing existing Sahelian and Sudanian woody plant data from destructive sampling. Overall, the seasonal maximum foliage, wood and root dry masses were measured on 900, 698 and 26 trees or shrubs from 27, 26 and 5 species, respectively, for which crown area was also measured. Several allometric regression models tested for foliage, wood or root masses are power functions and independent of species. All the regression outputs were inter-compared for fit indicators, by systematic estimates of prediction uncertainty and by root-to-wood ratios and foliage-to-wood ratios over the range of crown areas. This resulted in a set of ordinary least squares log–log equations with crown area as the independent variable. The Sahelian and Sudanian allometry equations were also compared with published allometry equations for tropical trees, primarily from more humid tropics, which are generally based on stem diameter, tree height and wood density. Our allometric predictions are within the range of other allometry predictions, reinforcing the confidence in their use beyond the Sahelian and Sudanian domains into sub-humid savannahs for discrete trees19.
On the basis of ref. 19, we predicted the wood (w), foliage (f) and root (r) dry mass as functions of the crown area (A) of a single tree as:
$$begin{array}{c}{text{mass}}_{{rm{w}}}(A)=3.9448times {A}^{1.1068},({N}_{{rm{w}}}=698) {text{mass}}_{{rm{f}}}(A)=0.2693times {A}^{0.9441},({N}_{{rm{f}}}=900) {text{mass}}_{{rm{r}}}(A)=0.8339times {A}^{1.1730},({N}_{{rm{r}}}=26)end{array}$$
The tree mass components of wood, leaves and roots were combined to predict the total mass(A) in kg of a tree from its crown area A in m2:
$$text{mass}left(Aright)={text{mass}}_{{rm{w}}}left(Aright)+{text{mass}}_{{rm{f}}}left(Aright)+{text{mass}}_{{rm{r}}}left(Aright)$$
As in ref. 1, a crown area of size A > 200 m2 was split into ({rm{lfloor }}A/100{rm{rfloor }}) areas of size 100 m2 and one area with the remaining m2 if necessary. We converted dry mass to carbon by multiplying with a factor of 0.47 (ref. 44).
Uncertainty analysis
We evaluated the uncertainty of our tree crown area mapping and carbon estimation in two ways. First, we quantified our tree crown mapping omission and commission errors by inspecting randomly selected areas from UTM zones 28–37, validating that our neural network generalized over UTM zones consistently (Extended Data Fig. 8).
Second, we quantified the relative error of our tree crown area estimation. We consider the uncertainty Δx of a quantity x and the corresponding relative uncertainty δx defined by the absolute and relative error, respectively45. To assess the relative error in crown area estimation resulting from errors by the neural network, we considered external validation data from ref. 1, which were not used in the model-building process. We considered expert-labelled tree crowns as well as the predicted tree crowns from 78 plots of 256 × 256 pixels. The hand-labelled set contained 5,925 trees and the system delineated 5,915 trees. The total hand-labelled tree crown area was 118,327 m2 and the neural network predicted 121,898 m2. This gave a relative error in crown area mapping of δarea = 3.3%. We matched expert-labelled and predicted tree crowns and computed the root-mean-square error (RMSE) per tree, taking overlapping areas and missed trees into account (see Extended Data Fig. 8). We estimated the allometric uncertainty (δallometric) using the data from ref. 19 (see below). The two relative errors δarea and δallometric were combined to an overall uncertainty estimate for the carbon prediction of ±19.8% (see below).
Omission and commission errors
We evaluated our tree crown mapping accuracy by analysis of 1,028 randomly selected 512 × 256-pixel areas over the 9.5° N to 24° N latitude within UTM zones 28–37. Because the drier 60% of our study area only contains 1% of the 9,947,310,221 trees we mapped in the 0–1,000 mm year−1 rainfall zone, we applied an 80% bias for selecting evaluation areas above the 200 mm year−1 precipitation line46, as >98% of tree identifications were above the 200 mm year−1 precipitation isoline. Identified tree polygons were further categorized into tree crown area classes from 0–15 m2, 15–50 m2, 50–200 m2 and >200 m2, with a total of 50,570 trees evaluated. Although a previous study reported greatest uncertainty in both the smallest and largest area classes1, our more expansive work found the greatest uncertainty in our smallest tree class. We excluded from evaluation any tiles that had annual precipitation46 >1,000 mm year−1 and all areas that were devoid of vegetation, leaving us with 850 areas.
Seven members of our team evaluated the accuracy in terms of commission and omission by tree crown area classes for the 850 areas. Input data provided for every area were the NDVI layer, the panchromatic shadow layer and the neural net mapping results in each of the four crown area classes. Ancillary data available to evaluators included the centre coordinates for comparison with Google Earth data, the Funk et al.46 rainfall, the acquisition date of the area evaluated and the viewing perspective.
We identified areas wrongly classified as tree crowns (commission errors), missed trees (omission errors) and crown areas corresponding to clumped trees (Extended Data Fig. 8). Clumped trees were most common for >200 m2 tree crown area. They were rare in the 3–15 m2 and 15–50 m2 tree classes, which comprise 88% of our tree crowns. In the 850 patches, the number of trees ranged from one tree to 326 trees, with a total of 50,570 trees evaluated and 3,765 errors identified. Overall, the commission and omission error rates were 4.9% and 2.7%, respectively, a net uncertainty of 2.2%.
Allometric uncertainty estimation
The prediction of tree carbon from the crown area for a single tree based on crown area alone is inherently uncertain47,48. As the allometric equations are based on three different datasets, we compute their uncertainties independently, combine them and put them in relation to the total carbon measured in the three datasets.
The allometric equations were established using an optimal least-squares fit of an affine linear model predicting the logarithmic carbon from the logarithmic tree crown area19. To estimate the uncertainty of the allometric equations, we repeated the fitting using random subsampling. The datasets were randomly split into training data (80%) for fitting the allometric equations and validation data (20%) for assessing the uncertainty. For example, from the root measurements, (({A}_{1},{y}_{1}),ldots ,({A}_{{N}_{{rm{r}}}},,{y}_{{N}_{{rm{r}}}})), we compute ({mu }_{{rm{r}}}=frac{1}{{N}_{{rm{r}}}}mathop{sum }limits_{i=1}^{{N}_{{rm{r}}}}{y}_{i}) and ({hat{mu }}_{{rm{r}}}=frac{1}{{N}_{{rm{r}}}}mathop{sum }limits_{i=1}^{{N}_{{rm{r}}}}{text{mass}}_{{rm{r}}}({A}_{i})). The corresponding error is ({varDelta }_{{rm{r}}}=|{mu }_{{rm{r}}}-{hat{mu }}_{{rm{r}}}|).
Because the total carbon for a tree with a certain crown area is the sum of the three carbon components, we add the absolute uncertainties assuming independence45.
$${varDelta }_{{rm{a}}{rm{l}}{rm{l}}{rm{o}}{rm{m}}{rm{e}}{rm{t}}{rm{r}}{rm{i}}{rm{c}}}simeq sqrt{{varDelta }_{{rm{f}}}^{2}+{varDelta }_{{rm{w}}}^{2}+{varDelta }_{{rm{r}}}^{2}}$$
and compute the relative uncertainty as ({delta }_{text{allometric}}=frac{{varDelta }_{text{allometric}}}{{mu }_{text{mass}}}), in which the average mass μmass is given by the sum of the averages for wood (μw), leaves (μf) and root (μr). This process was repeated ten times, resulting in a mean relative uncertainty of
$${bar{delta }}_{{rm{allometric}}}=19.5 % .$$
Total carbon uncertainty
We combine the uncertainties from the neural net mapping and our allometric equations, which can be viewed as considering (1 + A)·(1 + B) with A and B being random variables with standard deviations δarea and δallometric. Neglecting higher-order and interaction terms, we combine the two sources of uncertainty to (delta simeq sqrt{{delta }_{{rm{area}}}^{2}+{bar{delta }}_{{rm{allometric}}}^{2}}), resulting in an uncertainty in total tree carbon for our study of ±19.8%. See also Extended Data Fig. 9 for the RMSEs of our predicted crown areas calculated on external validation data from ref. 1, binned on the basis of the 50th quantiles of the hand-labelled crown areas and converted also into carbon. Extended Data Fig. 10 is a flow diagram summarizing our methods.
Our viewer
Visualizing our large tree-mapping dataset in an interactive format was essential for quality-control purposes, exploration of the data and hypothesis creation. Creating a web-based viewer serves the purpose of being the initial point of interaction with our dataset for fellow researchers, local stakeholders or the general public. The visualization of more than 10 billion trees in a web browser required maintaining performance, interactivity and individual metadata for each polygon. Users should be able to zoom in to any area within the dataset to view individual tree polygons and query their statistics while at the same time accurately depicting the overall trends of the dataset at lower zoom levels. The visualization also needed to clearly denote where data were missing or possibly affected by clouds or aerosols. Finally, the extent and origin of the source imagery, its acquisition date and a preview of the imagery needed to be available. To accomplish these goals, a vector-tile-based approach was taken, with the data visualized in a Mapbox GL JS map within a React web application. To create vector tiles covering the entire study area, we developed a data-processing pipeline using high-performance computing resources to transform the data into compatible formats, as well as to package, optimize and combine the vector tiles themselves.
We used two tracks to store and visualize the results of this study on the web: vector polygon data and generalized rasters representing tree crown density. At the native spatial resolution of 50 cm, the map shows the full-resolution tree polygon dataset. At lower-spatial-resolution zoom levels, rasterized representations of tree density are shown. Visualizing generalized rasters in place of vector polygons improves performance substantially. As users zoom in to higher spatial resolutions, the raster layer fades away and is replaced by the full-resolution polygon layer. Once zoomed far enough to resolve individual polygons, users can click to select a polygon to show a map overlay containing various properties of the tree, as well as the date on which the source imagery was acquired and a link to preview the source imagery.
Rainfall data
We used the rainfall data of Funk et al. to estimate annual rainfall at 5.6-m grids46. We averaged the available data from 1982 to 2017 and extracted the mean annual rainfall for each mapped tree and bilinearly interpolated it to 100 × 100-m resolution. The rainfall data were also used to classify the study area into mean annual precipitation zones: hyper-arid from 0–150 mm year−1, arid from 150–300 mm year−1, semi-arid from 300–600 mm year−1 and sub-humid from 600–1,000 mm year−1 zones. The rainfall data are found at https://data.chc.ucsb.edu/products/CHIRPS-2.0/africa_monthly/ (ref. 46).
Source: Ecology - nature.com