Study area and field data
The dataset was generated over NYC, located in the north-eastern United States (40.713° N, 74.006° W). NYC has a total area of 778.2 km2, which is composed of five boroughs, i.e. Brooklyn, Queens, Manhattan, Bronx, and Staten Island (Fig. 1). There were 296 field plots randomly sampled (Fig. 1a) and measured in the summer of 2013 over NYC following the i-Tree Eco protocols9 developed by the United State Department of Agriculture Forest Service (USFS). Each plot occupied a circular area of 404.7 m2. All the trees with a Diameter at Breast Height (DBH) larger than 2.54 cm were surveyed to record their tree height, species, DBH, and other structural attributes. Within all the 296 plots across NYC, there were 1,075 trees in 139 species surveyed. The species types with the top ten largest sample size were Acer platanoides (65 samples), Cedrus species (59 samples), Ailanthus altissima (58 samples), Sassafras albidum (56 samples), Quercus alba (51 samples), Betula lenta (42 samples), Robinia pseudoacacia (39 samples), Acer rubrum (38 samples), and Hardwood species (37 samples). Because the exact coordinates of individual trees were not collected, we mainly used the plot-level tree attributes (i.e. tree number, mean tree height, and maximum tree height) to validate LiDAR-derived products. Due to confidential requirements, the exact coordinates of field plots were not allowed to be released.
Aerial image and land cover maps
A fine-resolution land cover dataset (0.91 m spatial resolution) provided via NYC OpenData (https://opendata.cityofnewyork.us/) was used to mask out non-vegetation areas. This land cover dataset was generated using an object-based image classification method5 from LiDAR data collected in 2010 and NAIP aerial imageries in 2009. This final land cover map includes seven classes, i.e., tree canopy, grass/shrub, bare earth, water, buildings, roads, and other paved surfaces (Fig. 1b). We regrouped the land cover map into vegetation (tree canopy and grass/shrub) and non-vegetation groups, and resampled the map into 1 m resolution to match with NAIP and LiDAR datasets. We also collected NAIP imagery in the summer of 2013 for tree structural estimation from the Google Earth Engine platform36. The NAIP image had a resolution of 1 m with four spectral bands (Red, Green, Blue, and Near Infrared). We further calculated the NDVI from the Red and Near Infrared bands of NAIP images for tree structural estimation.
LiDAR data and processing
The LiDAR data were collected using a Leica ALS70 LiDAR system from two flight missions (https://noaa-nos-coastal-lidar-pds.s3.amazonaws.com/laz/geoid18/4920/index.html). The first LiDAR flight was taken on August 5th, 2013 at 2,286 m above ground level with an average side lap of 30%. The LiDAR data from this flight had a nominal pulse spacing of 0.91 m. The second flight was taken between March and April, 2014 at 2,286 m above ground level with an averaged side lap of 25% and a nominal pulse spacing of 0.7 m. According to the ground control survey, the LiDAR scan had a root mean square error accuracy of 9.25 cm. With up to 7 returns per pulse, the final LiDAR dataset has a point density of 5.9 points/m2.
The tree structural information was mainly generated from LiDAR-derived CHM. The CHM was the difference between Digital Surface Model (DSM) and Digital Terrain Model (DTM) generated from LiDAR point clouds using the Kriging interpolation method37. All the raster layers (CHM, DSM, DTM) were generated at 1 m resolution using the LiDAR360 software (GreenValley International). We generated a Tree Canopy Cover (TCC) map by masking out non-vegetation land cover types from areas with CHM values larger than 2 m. The TCC was a binary map with the value of one indicating tree cover and zero indicating non-tree cover at 1 m resolution. The non-vegetation areas were derived from the land cover map (Fig. 1b). The 2 m tree canopy height threshold was chosen by referencing a commonly accepted canopy height threshold38.
Individual tree segmentation and feature estimation
Individual tree crowns were segmented from LiDAR-derived CHM using the Marker-controlled Watershed Segmentation algorithm. This algorithm was widely adopted for LiDAR-based tree crown segmentation25,26,29 because it takes the advantages of both region-growing and edge-detection methods39. Due to the relatively low LiDAR point density, the CHM contained abnormal pits even after masking out non-tree-canopy pixels. We applied a Gaussian filter with two standard deviations to smooth the CHM and fill these pits in CHM. Then the segmentation was applied with a 3 × 3 moving window. Both smoothing and segmentation were conducted using the System for Automated Geoscientific Analyses software40. To refine the segmentation results, we deleted small segments with an area smaller than 1 m2 (one CHM pixel), which was most likely to be noise in CHM. We also visually examined and manually re-segmented extremely large segments by assuming most tree crowns should not exceed an area of 200 m2. The final tree crown dataset only contains segments with a maximum CHM value no less than 5 m because vegetation with lower height was mostly likely to be non-tree. All the post-segmentation operations were conducted in ESRI ArcMap 10.8.
We estimated five tree structural features for each individual trees, which include tree top height, tree mean height, crown area, tree volume, and carbon storage. Tree top height (m) characterizes the height from ground to tree top, estimated as the maximum CHM value within each tree crown segment. Tree mean height (m) indicates the average height of the tree crown surface, calculated as the mean CHM values within each tree segment. Tree crown area (m2) is the total area of each tree crown segment. Tree volume (m3) is the volume of 3D space occupied by the tree crown25, which was calculated as the volume difference between crown surface (defined by CHM) and crown base (Eq. 1). Because the tree crown base height was difficult to estimate for individual trees due to the relatively low LiDAR point density, we used the 2 m to approximate the averaged crown base height according to Ma et al.25. The sensitivities of crown volume to the selection of crown base height from 1 m to 5 m was presented in the Technical Validation section.
$$Volume={sum }_{i=1}^{n}left(CHMi-crown;base;heightright)times 1{m}^{2}$$
(1)
Where CHMi is the CHM values of the ith pixels within a tree segment, n is the total number of pixels within a tree segment. 1m2 is the area of each CHM pixel.
The carbon storage (ton) was defined as the total carbon stock in both above- and below-ground biomass of each tree. The carbon storage was estimated in two steps: (1) calculating tree biomass from field measurements using allometric equations41; (2) running a regression between field measured carbon storage and LiDAR-derived tree structural features42 and applying the regression model to individual trees. In step (1), we applied species-specific allometric equations from i-Tree Eco database. There are more than 50 species-specific equations in i-Tree Eco, which can be summarised into four main equation forms with different coefficient values (Eqs. 2–5).
$$Biomass=exp left({beta }_{1}+{beta }_{2}ast LNleft(DBHright)+frac{{sigma }^{2}}{2}right)$$
(2)
$$Biomass=exp left({beta }_{1}+{beta }_{2}ast LNleft({{rm{DBH}}}^{2}ast {rm{H}}right)+frac{{sigma }^{2}}{2}right)$$
(3)
$$Biomass={beta }_{1}ast left(DB{H}^{{beta }_{2}}right)$$
(4)
$$Biomass={beta }_{1}ast left({left({{rm{DBH}}}^{2}ast {rm{H}}right)}^{{beta }_{2}}right)$$
(5)
Where β1 and β2 are species-specific coefficients, DBH is diameter at breast height, H is tree top height, σ2 is the variance of model errors, which is applied to correct the potential underestimations when back-transforming predictions from logarithmic scale to original scale. For other species that were not included in the i-Tree Eco database, the averaged results from the four equations were applied. These allometric equations (Eqs. 2–5) estimate the entire tree biomass including both above- and below-ground biomass, and the final carbon storage for each field plot was converted to carbon by a factor of 0.541.
In step 2), we compared different regression models to simulate carbon storage at plot scale using LiDAR data and NAIP imagery. First, we compared the single variable regression for carbon storage from NAIP-derived NDVI, LiDAR-derived TCC, LiDAR-derived CHMmean and CHMmax, respectively. The four metrics were calculated at 1 m resolution, masked out non-vegetation areas, and aggregated over each field plot. TCC was calculated as the percentage area with tree cover (CHM >2 m). CHMmean and CHMmax were calculated as the mean and maximum of all CHM values within each plot. We compared different regression algorithms, including linear, exponential, and quadratic regressions. We also compared the modelling efficiency and accuracy between using single and multiple variables by combing all the attributes together using the Random Forest regression model. Using the optimal regression model, we generated a carbon density map at 20 m spatial resolution (each pixel size is similar to the plot size of 404.7 m2) by dividing the total carbon storage by the pixel size (400 m2) in the unit of ton/ha (Eq. 6). More details of carbon density estimation can be found in our previous publication42. The carbon storage for each tree was calculated as the product of crown area and crown density (Eq. 7).
$$Carbon;densityleft(ton/haright)=0.5ast Biomass(ton)/left(400left({m}^{2}right)ast 0.0001left(ha/{m}^{2}right)right)$$
(6)
$$Carbon;storageleft(tonright)=Crown;arealeft(haright)ast Carbon;density(ton/ha)$$
(7)
Where Biomass is the total biomass for each pixel, which was 400 m2. ha is short for hectare, which is 10000 m2.
We further quantified the uncertainty range in carbon storage estimation by propagating the potential error in carbon density regression to tree level carbon storage estimation. We first calculated the 95% confidence interval of the best carbon density regression model, and applied the confidence interval to carbon storage estimation for individual trees. The predicted the upper and lower values for individual tree carbon storage were given in the final dataset and summarized in Table 1.
Block group level tree structure distribution mapping
Three sets of tree structural parameters were mapped at block group level, including tree density (the number of trees in each hectare), tree height (m), and carbon density (ton/ha). The mean values of tree density, tree top height, and carbon density within each block group of NYC. The block group boundary was downloaded from https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2014.html, which includes a total of 6392 block groups.
We also estimated the potential tree height and carbon density at the census block group level. We assumed the 95% of the tree height and carbon storage values within each block group at mapping time were their potential values, which most trees can achieve during their life time. Then, we calculated the difference in tree height and carbon density between potential values (95%) and mapping time values (mean) over each block group, and used them as the extra carbon storage that trees could achieve during their life time. It is to be noticed that in this study we did not consider the carbon loss by tree degradation or removal, or extra carbon gain through the tree planting and management.
Source: Ecology - nature.com