in

# The flowering of Atlantic Forest Pleroma trees

### Study site

The study covered the Brazilian Atlantic Forest domain (Fig. 2), which is located on the east coast of Brazil between latitudes 5° and 30° south, expanding over 500 km inland in the south. It consists of a total area of 1,085,151 km2 with limits defined by the Brazilian Ministry of the Environment48. The total area covered by Sentinel-2 tiles overlapping with the Atlantic Forest domain is ~2,006,959 km2. This latter area was used to compute the descriptive statistics of detections.

### Data

#### Sentinel 2 images

The pink or magenta blossoms of Pleroma trees were mapped using Sentinel-2 multi-spectral data with 10 m spatial resolution taken approximately every five days under the same viewing conditions. We used only Sentinel-2 images with Level-1C correction—which are orthoimage products, i.e. map projections of acquired images using a digital elevation model to correct ground geometric distortions—and delivered in images of 100 km × 100 km. Pixel radiometric measurements were provided in Top-Of-Atmosphere (TOA) reflectances (coded in 12 bits)49.

In the analysis, 213 Sentinel-2 tiles covering the Brazilian Atlantic Forest domain were used, totaling 2,006,959 km2 which is equivalent to ~20 billion Sentinel-2 pixels with 10 m spatial resolution (Fig. 2a). Amongst the 213 selected tiles, 36 had 2 orbits to download to obtain the full tile image due to the overlapping orbit paths (called replicates in the following text).

For each tile and replicate (213 + 36), the times series between 31 June 2016 and the 1 July 2020 was downloaded from the Google Cloud Storage Sentinel-2 repository (https://cloud.google.com/storage/docs/public-datasets/sentinel-2). To reduce the dataset size, we retained only images with less than 80% cloud cover; and, from the month outside the flowering months of the Pleroma trees (July to November), we kept only images with less than 25% cloud cover. The complete dataset was made up of 33,798 Sentinel-2 images.

Four spectral bands available at 10 m spatial resolution were used: Red (665 nm), Green (560 nm), Blue (490 nm) and NIR (842 nm). A border of 120 pixels with NA values was added to the image to produce images of 10240 × 10240 pixels to ease automation of the image analysis workflow, which generally works with 2n × 2n size pixel images. In our case here, the deep learning analysis was made with 128 × 128 pixel images and an additional 8 × 8 border. Sentinel L1C reflectance values are in the range of 0–10000 and were converted to 8 bits (0–254) with the following rules : for Red, Green and Blue bands, we kept the minimum value between 2540 and the original pixel value, divided this value by 10 and converted the result to integer; and for the NIR band, we keep the minimum value between 2540 and the original pixel value divided by a constant equaling 3.937, divided this value by 10 and converted the result to integer. While it was not expected to have RGB pixel values for vegetation with reflectance above 2540, it occured frequently for the NIR values. Dividing the NIR band values by the constant 3.937 enabled scaling the full range of the original NIR values between 0 and 2540 without losing too much information. For each tile, all 4 bands were saved in one GeoTIFF of 8 bits to ease storage and processing. The size of the complete dataset was 5.59 teraoctets. The automatic download, scaling and conversion of the images to 8 bits took about 25 days (from 16 July 2020 to 3 August 2020 and from 10 September 2020 to 13 September 2020).

#### Environmental data

To test the association of Pleroma trees with elevation and slope, elevation data from the Shuttle Radar Topography Mission (SRTM) were used50 (Fig. 12a). Specifically, we used the 3 arc-seconds (~90 m) spatial resolution digital elevation database (version 4) provided by the CGIAR Consortium for Spatial Information51. This dataset, in comparison to the original NASA STRM dataset, has been processed to fill data voids. From this dataset, we used the variables elevation (m) and computed slope (°) considering the four neighbor pixels (Fig. 12b). To analyse the relationship between Pleroma trees presence and forest tree cover, we used the tree cover percentage for the year 2000 at 30 m of spatial resolution, which we obtained from the global forest cover dataset (Fig. 12c), which is based on Landsat time series52.

The association of Pleroma trees with local climate was tested using the annual means of precipitation and air temperatures (Fig. 12d–f). The mean annual precipitation over the study period was computed from the CHIRPS v2p0 monthly precipitation dataset at 0.05° of spatial resolution produced by University of California, Santa Barbara (UCSB). CHIRPS data are global rainfall estimates from rain gauges and satellite observations53. The mean of maximum and minimum air surface temperatures over the study period were computed from the Aqua/AIRS L3 Daily Standard Physical Retrieval (AIRS-only) at 1° of spatial resolution V7.0 (AIRS3STD). AIRS, the Atmospheric Infrared Sounder on NASA’s Aqua satellite, gathers daily infrared energy emitted from Earth’s surface and atmosphere globally and provides 3D measurements of temperature and water vapor through the atmospheric column54. The annual mean of minimum and maximum air surface temperatures was calculated using the daily air surface temperature measured from the descending orbital pass, which occurs at 1:30 am local time (’SurfAirTemp_D’), and the ascending orbital pass, which occurs at 1:30 pm (’SurfAirTemp_A’).

Additionally, maps produced by the Brazilian Institute of Geography and Statistics (IBGE) of the geomorphological units and rivers of Brazil were used to describe the spatial distribution of the blossom detections33.

All environmental variables were resampled to a raster of 1280 × 1280 m spatial resolution using an average interpolation to match the resolution of the Pleroma tree detection dataset.

### Model

#### Neural network architecture

This detection model is a deep learning neural network (Fig. 13), more specifically an encoder with a VGG16-like structure35, that given an image (input image) return the probability of presence of Pleroma trees with flowers in the image. The model inputs are 4 bands RGB-NIR images made up of 136 × 136 pixels at 10 m of spatial resolution (Fig. 13). Sentinel-2 tiles of 10240 × 10240 pixels were cropped based on a regular grid of 128 × 128 pixels, and 4 neighbouring pixels were added on each side to create an overlap between the patches. The resulting images are 136 × 136 pixels in size. However, in the training, the presence or absence of blooming Pleroma was given only for the images of 128 × 128 pixels without consideration of the borders. This enable to avoidance of the border effect that is common in convolutional neural networks. Each image of 136 × 136 pixels goes through a data augmentation process that consists in random vertical and horizontal flips. No additional data augmentation necessary due to the natural data augmentation provided by atmospheric conditions and illumination. After data augmentation, the images were then fed to the detection encoder. The encoder was made up of 5 consecutive convolution and pooling blocks, one fully connected layer (dense 100) and a final output layer with a softmax activation that provided the probability of presence of blooming Pleroma trees in the image (Fig. 13). Additionally, one drop-out layer was used at the end of the fully connected layer to perform further implicit data augmentation and avoid overfitting during training. The model has a total of 25,448,686 parameters, of which 25,440,622 are trainable. The model was coded in R language55 with Rstudio interface to Keras and TensorFlow 2.256,57,58,59.

#### Network training

To make the training sample, a manual sample was produced for the Sentinel-2 tile 23KMQ, in the area where we had previously made a high resolution map of blooming Pleroma6, and for five other tiles where flowering Pleroma were detected visually from high resolution Google Earth images (22JFQ, 22JGQ, 23KLP, 23KLQ and 23KNQ, respectively). What is identified in the Sentinel-2 images are forest stands dominated by Pleroma and not single individuals. Pleroma trees have a small stature (8–12 m height) and crown of less than 10 m and one tree alone cannot influence sufficiently the reflectance to be clearly detectable in Sentinel-2 images. However, they occur very frequently clumped together, a common behaviour of this pioneer Genus. These flowering Pleroma dominated forest stands were easy to identify visually in the Sentinel-2 images because they combined several very distinctive features. First, an intense magenta-to-deep-purple color, which is an unusual color for other land covers in this ecological domain. Second, these identified Pleroma pixels should be undoubtedly identified as forested pixels and have a green color outside the blooming season. Third, Pleroma dominated forests often formed continuous magenta-to-deep-purple patches of size ranging from some 10 m × 10 m pixels to more than thousands of pixels and the shape of the patches tend to present linear features, likely representing the border of the space that was colonized by the Pleroma trees. Fourth, individual crowns were not visible, and the texture of the patches was very smooth during the blooming season with sometimes some inclusions of tree crowns of green color. Finally, texture of the Pleroma dominated forest stands outside the blooming season shown a smooth green texture, more homogeneous and with less shade than other forests. The first sample was constituted of images of background and of blooming Pleroma dominated forest stands that were following the previously described criteria. From this sample, we train a first model and applied it to the complete time series of Sentinel-2. From the results of this model, we obtained a first map of Pleroma trees and were also able to identify the main detection errors of this model, mainly clouds and dirt roads proximity with some unidentified agriculture fields or sometimes Eucalyptus plantation. The results of this first model were checked visually to produce a second larger sample (which was used for the results presented in this study) made up of images containing blooming Pleroma dominant to monospecific forest stands, a set of background images without blooming Pleroma and images identified erroneously by the first model as containing blooming Pleroma. While a large majority of the detected pixels were undoubtedly forest stands dominated by Pleroma trees, some other isolated trees of the genus Handroanthus (Ipê in Brazil or Lapacho in Argentina) with pink flowers and large crowns covering several pixels of 10 m × 10 m were also detected and kept in the training sample. For these particular Handroanthus trees, crowns were visible during and sometimes also outside the blooming season, which was not the case for detected Pleroma dominated forest stands. Finally, as our model detected also large Handroanthus trees, we must acknowledge that other tree species with highly similar features could also potentially being detected.

The final training samples comprised a total of 158,612 images of 136 × 136 pixels. Among them, 35,541 contained blooming Pleroma trees and 123,071 images contained only background. Among the background images, there was nine different images types: images without blooming Pleroma, i.e., background such as other land covers, urban structures, water surfaces and agriculture and other land uses (57,007), images with forests containing Pleroma but outside the flowering period (23,427), images with clearly identified detection errors mainly located in the east of the São Paulo state (12,965), clouds and detection errors in clouds (10,991), images clearly identified as detection errors near the State of Bahia (9030), other detection errors over Atlantic Forest (5843), images of forests without Pleroma trees during the season of blooming (2170), images with identified detection errors in the northern part of Atlantic Forest (1126) and images with no data (512). Of these images, 80% (126,890) were used for training and 20% (31,722) used for validation.

During network training, we used a standard stochastic gradient descent optimization, a binary cross-entropy loss and the optimizer RMSprop60 with a learning rate of 1e-4. We used the accuracy (i.e. the frequency with which the prediction matches the observed value) as the metrics of the model. However, due to the imbalance between the number of blooming Pleroma and background images, the metric of the model was weighted by one for the background and, for the Pleroma, by the ratio between the number of background images and the number of images containing blooming Pleroma: that is, ~3.5. The network was trained for 5000 epochs, where each epoch was made of 61 batches with 2048 images per batch and the model with the best weighted accuracy was kept for prediction (epoch 4331 and weighted accuracy of 99.58%). The training of the models took approximately 9 hours using a Nvidia RTX2080 Graphics Processing Unit (GPU) with an 8 GB memory.

#### Prediction

To avoid border effects, each 10240 × 10240 pixels Sentinel-2 image was cropped on a regular grid of 128 × 128 pixels (1280 × 1280 m), and 4 neighboring pixels were added on each side to create an overlap between the patches. The function gdal_retile61 was used for this operation. Prediction was then made for each subset image: for each image, the detection model returned 0 or 1 if a blooming Pleroma was found in the image. Then the results were spatialized again using the grid, but this time, each cell of the grid only received 1 value, the prediction, resulting in a raster of 80 columns and 80 rows and a spatial resolution of 1280 m, of the same extent as the Sentinel-2 image. The value of the pixels (1 or 0) indicated the presence or absence of blooming Pleroma trees in this squared area of 1280 m of side. Prediction using GPU of a single tile of 10240 × 10240 pixels took approximately 1 minute on a Nvidia GTX1080 with an 8 GB memory and 45 s on a Nvidia RTX2080 with an 8 GB memory. The prediction for the complete Sentinel-2 time series presented in this work took approximately 22 days using a Nvidia GTX1080 GPU—from the 30 October 2020 to the 20 November 2020.

### Spatio-temporal analysis

To analyse the seasonality of the detections, daily maps of flowering Pleroma detections were produced for the studied period on a grid overlapping the entire Atlantic Forest (projection UTM 23S and datum WGS84) with a spatial resolution of 1280 m to match the resolution of the predictions. For each day, each pixel of the grid was given a classification: observed with flowering Pleroma, observed without flowering Pleroma, observed with clouds (using the cloud cover mask for Sentinel-2 images of this day) or as non-observed (no image or NA data for the pixel on that day). These daily grids were use to produce the map of flowering Pleroma trees (number of detections of flowering Pleroma for each pixel along the time series), the map of the total number of observations per pixel and the map of the total number of observations without clouds.

To analyse the seasonality of blooming, the detection results were aggregated by month—even with a 5-day frequency there were still too few observations to analyse each annual timing and duration of flowering, and changes of the flowering dates between years were not expected based on the existing botanical information of the species. For each pixel, the number of detections per month were divided by the total number of observations without clouds per month. This enabled to normalized the detection values between zero and one and made sense given that we were not interest in the number of detections but rather in the times of the year when the number of detections was the highest: the peak of the blooming.

To find the characteristics of these time series—one or more blooming peaks and the days when the blooming begins, peaks and stops—the normalized time series of mean monthly observations of flowering Pleroma were filtered using the Fourier transform (FT) (Eq. 1). This decomposition was made the keeping only the annual, bis- and tris-annual frequencies that compose the blooming signal, and to provide a continuous representation of the discrete blooming observations. In other worlds, the Fourier transform of the normalized time series observations enabled to model and compute the values of blooming for each day of the year and better estimate the days blooming started, peaked, and ended. While initially, a decomposition with only annual and biannual frequencies was expected to fit well to the times series (as more than two peaks per year were not expected), it appeared that when the two peaks were close in time (such as in a 2–3 month interval), only annual and biannual frequencies were not sufficient to give a good model of the signal, and the triannual frequency was added to resolve this issue. Furthermore, it was assumed that other periods in the signal were only constituted by noise.

The blooming signal was modelled by the following equation:

begin{aligned} {widehat{bloom}}(t)& = bloom_0 + pow_0 ,left( p_{4} sin left( 2pi frac{1}{4} t + rho _4right) right. & quadleft. + p_{6} sin left( 2pi frac{1}{6} t + rho _6right) + p_{12} sin left( 2pi frac{1}{12} t + rho _{12}right) right) end{aligned}

(1)

with (p_4 + p_6 + p_{12}=1) and for (t=1,ldots ,12 times n), ({widehat{bloom}}) is the filtered blooming time series; (bloom_0) is as an estimate of the mean annual blooming; t is the time in months; (rho _4), (rho _6) and (rho _{12}) are the delay of signal components with periods of 4 months, 6 months and 12 months, respectively; (pow_0) is the power of the signal and (p_4), (p_6) and (p_{12}) are the relative proportions in the signal of the periods of 4 months, 6 months and 12 months, respectively.

To ease optimisation and cohere with the biological significance of our model, some data cleaning and adjustments were made. First, pixels with less than 4 observations over the 4-year period were removed from the analysis. Second, isolated peaks with only 1 or 2 observations during the 4-year period and between months without Pleroma detection were set to 0. Third, all the values of the normalized blooming time series were multiplied by 10, which seemed to ease the convergence of the optimisation algorithm. Fourth, the first months before and after the blooming period were set to a negative value equal to − 0.15 × the maximum value of the pixel time series. This was made based on the assumption that blooming is quite fast (based on the observation data) and happens between the month when the blooming is first observed and the previous month (and when blooming is last observed and the next month), and it forced the model to go below the 0 value during this period. Fifth, a weight was added to each point corresponding to its value, as we were interested in estimating accurately the peak value. A weight value of 0 was set to the month with a 0 value, and a weight of 1 was set to the months with negative blooming values (pre- and post-peak months). Finally, to facilitate the optimization, the time series of values and weights was replicated 3 times (n = 3). While this did not change the periodicity of the signal, it enabled to estimates better the value of the first and last month of the time series, as well as to ease optimisation. The parameters (bloom_0), (pow_0), (p_4), (p_6), (p_{12}), (rho _4), (rho _6) and (rho _{12}) were then estimated by a weighted least square minimization using the weights previously described. The accuracy of the model was given by the weighted R2 computed with the observed and predicted values of blooming for each month. As the Fourier transform is highly flexible, it can adjust almost perfectly to the data: the median of weighted R2 was close to 1 (with a 95% confidence interval—from percentile 2.75 to 97.5—of 0.86.0 to 1).

After the decomposition of the blooming signal, a daily time series of 1 year of ({widehat{bloom}}) was computed with the obtained parameters (365 values) (Fig. 14). Daily values of months with a weight of 0 were set to 0 as well as predicted negative values. Then all peaks and pits were identified in the ({widehat{bloom}}) time series. A peak or pit is an observation that is preceded and followed by, respectively, lower or higher observations62,63. For each peak, the day of start and stop were identified using the pit values. After this analysis, we were able to describe the blooming time series: that is, if there were one or more peaks and, for each peak, the days when the blooming initiated, peaked and stopped.

To determine if different populations could be identified based on flowering timing, a cluster analysis was performed. A classical K-means clustering analysis was made on a dataset containing, for each pixel where Pleroma were detected, the days of start, peak, and end of blooming, the associated normalized blooming values and the xy coordinates of the pixels. If there were two peaks for a pixels, a line for each peak was created in the dataset. As the xy coordinates were in metres, they carried most of the variance in the dataset. To avoid the artefact of having clusters based only on the distance between pixels, the xy coordinates were divided by 100,000 and rounded to the nearest unit. Before the clustering analysis, all variable were scaled and centered. The number of clusters was determined based on the curve representing the total within-cluster sum of squares as a function of the number of clusters, and also to have the maximum number of clusters.

To describe the association of Pleroma trees with environmental variables, we first reclassified each environmental variables into 10 classes according to the variable’s quantiles. Then a bootstrap procedure was applied. For the number N of Pleroma trees detections, N random point locations were sampled within the Atlantic Forest domain, and the value of each environmental variable at each point was extracted and stored. This operation was repeated 100 times. It enabled to compare the number of Pleroma trees in each quantile class with the mean and gave us a 95% confidence interval for the number of points obtained by random spatial sampling in each class. Using the elevation as an example, the null hypothesis of no spatial association between Pleroma trees and elevation was rejected at a level of 0.05% if the number of Pleroma trees in a quantile class of the elevation was outside the (0.025, 0.975) quantiles of the empirical distribution of elevation obtained by random location sampling in the same class. The same analysis of association with the environmental variables was made for the Handroanthus population identified by the K-means clustering analysis.

All analyses were performed using R project software55.

Source: Ecology - nature.com